The Information retrieval GEneral Reading (TIGER) Group

CIDDA Logo
CIDDA IR Group, RMIT University

Andrew Turpin founded TIGER in 2006 and organized weekly meetings for quite some time.

Students, staff, and interested parties outside the university are welcome to attend.

We encourage all members to suggest new research topics, guide, and participate in the weekly discussions.

For inquiries about joining TIGER, organizing TIGER talks, or guiding a discussion, please contact Sachin Pathiyan Cherumanal via email: sachin.pathiyan.cherumanal@student.rmit.edu.au. or Elisa Mena-Maldonado <"firstname dot lastname at student dot rmit dot edu dot au">.

Word Cloud Abstracts 2021

wordcloud 2021

TIGER Talks

The talks are held every Thursday from 12:30 to 13:30 (GMT+10, Melbourne Australia Time) unless other times specified

June
Date Speaker Title Abstract
29/06/2022 Alessandro Benedetti Dense Retrieval with Apache Solr Neural Search Neural Search is an industry derivation from the academic field of Neural information Retrieval. More and more frequently, we hear about how Artificial Intelligence (AI) permeates every aspect of our lives and this includes also software engineering and Information Retrieval. In particular, the advent of Deep Learning introduced the use of deep neural networks to solve complex problems that could not be solved simply by an algorithm. Deep Learning can be used to produce a vector representation of both the query and the documents in a corpus of information. Search, in general, comprises of performing four primary steps: - generate a representation of the query that describes the information need - generate a representation of the document that captures the information contained in it - match the query and the document representations from the corpus of information - assign a score to each matched document in order to establish a meaningful document ranking by relevance in the results With the Neural Search module, Apache Solr is introducing support for neural network based techniques that can improve these four aspects of search. This talk explores the first official contribution of Neural Search capabilities available from Apache Solr 9.0(may 2022): Approximate K-Nearest Neighbor Vector Search for matching and ranking. You will learn: - how Approximate Nearest Neighbor (ANN) approaches work, with a focus on Hierarchical Navigable Small World Graph (HNSW) - how the Apache Lucene implementation works - how the Apache Solr implementation works, with the new field type and query parser introduced - how to run KNN queries and how to use it to rerank a first stage pass Join us as we explore this new Apache Solr feature!
23/06/2022 Leila Tavakoli MIMICS-Duo: Offline & Online Evaluation of Search Clarification Asking clarification questions is an active area of research; however, resources for training and evaluating search clarification methods are not sufficient. To address this issue, we describe MIMICS-Duo, a new freely available dataset of 306 search queries with multiple clarifications (a total of 1,034 query-clarification pairs). MIMICS-Duo contains fine-grained annotations on clarification questions and their candidate answers and enhances the existing MIMICS datasets by enabling multi-dimensional evaluation of search clarification methods, including online and offline evaluation. We conduct extensive analysis to demonstrate the relationship between offline and online search clarification datasets and outline several research directions enabled by MIMICS-Duo. We believe that this resource will help researchers better understand clarification in search.
09/06/2022 Chenglong Ma Evaluation of Herd Behavior Caused by Population-scale Concept Drift in Collaborative Filtering Concept drift in stream data has been well studied in machine learning applications. In the field of recommender systems, this issue is also widely observed, as known as temporal dynamics in user behavior. Furthermore, in the context of COVID-19 pandemic related contingencies, people shift their behavior patterns extremely and tend to imitate others' opinions. The changes in user behavior may not be always rational. Thus, irrational behavior may impair the knowledge learned by the algorithm. It can cause herd effects and aggravate the popularity bias in recommender systems due to the irrational behavior of users. However, related research usually pays attention to the concept drift of individuals and overlooks the synergistic effect among users in the same social group. We conduct a study on user behavior to detect the collaborative concept drifts among users. Also, we empirically study the increase of experience of individuals can weaken herding effects. Our results suggest the CF models are highly impacted by the herd behavior and our findings could provide useful implications for the design of future recommender algorithms.
02/06/2022 Lisa Given The Challenges of Mythical User Design: Lessons from the Wine Industry The proliferation of inexpensive, template-driven, and easy-to-access website and app designs has opened information sharing and portal creation to people across groups and organisations. With the increasing focus on research impact, many researchers now include the design of web portals and other tools in grant applications, as mechanisms to share outcomes with industry and the public. And community organisations, government agencies, and small businesses have embraced web tools to reach customers and constituents. Unfortunately, such easy access to information sharing “solutions” risks marginalising users when templates are adopted uncritically, when potential users are not involved in design processes, and when users’ post-launch concerns are not addressed through redesign. This presentation explores these issues by sharing results from an ARC Linkage Project that examined websites and smartphone apps created for use in the wine industry. The studies used qualitative, user-focused approaches to understand winemakers’ information needs and to assess the viability of existing tools to support those needs. The results demonstrate the need for interdisciplinary, collaborative co-design practices to fully support users’ technological needs and expectations.
April
Date Speaker Title Abstract
08/04/2022 Melwin Pais (AWS) and Emma Arrigo (AWS) Personalisation in practice. In this session, we will learn how you can use ML to generate recommendations for users based on their preferences and behaviour, personalised re-ranking of results, personalising content for emails, or create targeted marketing campaigns based on user segments. We will review personalisation use cases in retail and entertainment. You will get access to a live demo and source code to experience it yourself and build personalisation flows for your users.
March
Date Speaker Title Abstract
31/03/2022 Shoujin Wang Veracity-aware and Event-driven Personalized News Recommendation for Fake News Mitigation Despite the tremendous efforts by social media platforms and factcheck services for fake news detection, fake news and misinformation still spread wildly on social media platforms (e.g., Twitter). Consequently, fake news mitigation strategies are urgently needed. Most of the existing work on fake news mitigation focus on the overall mitigation on a whole social network while ignoring developing concrete mitigation strategies to deter individual users from sharing fake news. In this paper, we propose a novel veracity-aware and event-driven recommendation model to recommend personalised corrective true news to individual users for effectively debunking fake news. Our proposed model Rec4Mit (Recommendation for Mitigation) not only effectively captures a user’s current reading preference with a focus on which event, e.g., US election, from her/his recent reading history containing true and/or fake news, but also accurately predicts the veracity (true or fake) of candidate news. As a result, Rec4Mit can recommend the most suitable true news to best match the user’s preference as well as to mitigate fake news. In particular, for those users who have read fake news of a certain event, Rec4Mit is able to recommend the corresponding true news of the same event. Extensive experiments on real-world datasets show Rec4Mit significantly outperforms the state-of-the-art news recommendation methods in terms of the capability to recommend personalized true news for fake news mitigation.
24/03/2022 Dana McKay and George Buchanan The lowest form of flattery: Scholarly plagiarism and its prevention We know that plagiarism is a big problem in student work, but how prevalent is it among publishing academics? Given that confirmed plagiarism is career-ending, there is significant incentive to address it informally, even when it is detected. This results in an air of mystique, and a fear among many academics that they may inadvertently find themselves involved in a plagiarism case. So how does it occur, and what can we do about it? In this talk we will present research into the prevalence of plagiarism in the digital libraries community (a community that arguably should know better). We will use this data to demonstrate that plagiarism occurs within specific research groups. We will follow this up with results from a study of research leaders from groups not engaged in plagiarism, discussing the strategies they use to promote good academic culture within their groups.
10/03/2022 Elham Naghizade Personalisation versus Privacy in an Interconnected World We are rapidly moving towards a more connected future through our social networks and smart devices. The paths we take, the food we order, the movies we watch and even the time we take a break and people we befriend with, all are being recommended based on our personal preferences and needs. Harnessing this rich personal data has the potential to improve our life, however, sharing such rich and often sensitive data raises considerable risks to our control over privacy. In this talk, I will discuss two views: The individual as a public entity involved in a participatory sensing network, and the individual as a private entity using personal data analytics for health and wellbeing benefits. I will discuss a range of algorithms that aim to either challenge or provide personalised privacy and enable privacy-enhancing analytics in these scenarios.
03/03/2022 Dilini Rajapaksha LIMREF: Local Interpretable Model Agnostic Rule-based Explanations for Forecasting, with an Application to Electricity Smart Meter Data Accurate electricity demand forecasts play a crucial role in sustainable power systems. To enable better decision-making especially for demand flexibility of the end user, it is necessary to provide not only accurate but also understandable and actionable forecasts. To provide accurate forecasts Global Forecasting Models (GFM) trained across time series have shown superior results in many demand forecasting competitions and real-world applications recently, compared with univariate forecasting approaches. We aim to fill the gap between the accuracy and the interpretability in global forecasting approaches. In order to explain the global model forecasts, we propose Local Interpretable Model-agnostic Rule-based Explanations for Forecasting (LIMREF), a local explainer framework that produces k-optimal impact rules for a particular forecast, considering the global forecasting model as a black-box model, in a model-agnostic way. It provides different types of rules that explain the forecast of the global model and the counterfactual rules, which provide actionable insights for potential changes to obtain different outputs for given instances. We conduct experiments using a large-scale electricity demand dataset with exogenous features such as temperature and calendar effects. Here, we evaluate the quality of the explanations produced by the LIMREF framework in terms of both qualitative and quantitative aspects such as accuracy, fidelity, and comprehensibility and benchmark those against other local explainers.
February
Date Speaker Title Abstract
25/02/2022 Ivan Dudanov Semantic Retrieval in Product Search Search engines have become a fundamental component in various industries especially those that specialize in product search applications e.g., retail, job search, content providers, etc. Due to the limitations of language gap between users and businesses, text-matching based retrieval techniques have shown their constraints. Specifically, text may no longer be the best approach to represent a complex product entity, given its high dimensionality and composite nature (e.g., text and visual information). Semantic retrieval techniques provide new opportunities to solve these complex problems with neural representations of queries and products. Recently, both Elasticsearch and Solr has introduced ANN (approximated nearest search) component in their new releases, which will potentially help businesses build more reliable semantic retrieval models for their search engines. In this talk, I will give an overview of semantic retrieval techniques and its applications on product search system.

December
Date Speaker Title Abstract
02/12/2021 Jie Li FairGAN: GANs-based Fairness-aware Learning for Recommendations with Implicit Feedback Ranking algorithms in recommender systems influence people to make decisions. Conventional ranking algorithms based on implicit feedback data aim to maximize the utility to users by capturing users’ preferences over items. However, these utility-focused algorithms tend to cause fairness issues that require careful consideration in online platforms. Existing fairness-focused studies does not explicitly consider the problem of lacking negative feedback in implicit feedback data, while previous utility-focused methods ignore the importance of fairness in recommendations. To fill this gap, we propose a Generative Adversarial Networks (GANs) based learning algorithm FairGAN mapping the exposure fairness issue to the problem of negative preferences in implicit feedback data. FairGAN does not explicitly treat unobserved interactions as negative, but instead, adopts a novel fairness-aware learning strategy to dynamically generate fairness signals. This optimizes the search direction to make FairGAN capable of searching the space of the optimal ranking that can fairly allocate exposure to individual items while preserving users’ utilities as high as possible. The experiments on four real-world data sets demonstrate that the proposed algorithm significantly outperforms the state-of-the-art algorithms in terms of both recommendation quality and fairness.
09/12/2021 12:00 pm ADCS 2021

Prof. Min Zhang

Keynote presentation

(Tsinghua University)

What users tell us: User Understanding and Modeling for Personalized Recommendation Personalized recommender systems have been one of the major ways of information acquisition. Understanding the user’s intent and behavior plays an important role and has been a trending topic in research and applications. In this talk, I will briefly introduce some observations and findings on our user understanding research in recent years, which are surprising or different from what people thought before. The topic in the talk involves user behavior analyses, dynamic intent modeling, and evaluation in multiple scenarios such as news streaming, e-commerce, job hunting, and music recommendation. Related researches have been published on ACM TOIS, WWW, SIGIR, WSDM, etc.

November
Date Speaker Title Abstract
04/11/2021 CIKM Conference
11/11/2021: 12:00 pm Liangjie Hong (LinkedIn) Computational Jobs Marketplace: Search, Recommendation and Beyond Online job marketplaces such as Indeed.com, CareerBuilder and LinkedIn Inc. are helping millions of job seekers find their next jobs while thousands of corporations as well as institutions fill their opening positions at the same time. On top of that, the global pandemic COVID-19 in 2020 up till now has profoundly transformed workplaces, creating and driving remote work environments around the world. While this emerging industry generates tremendous growth in the past several years, technological innovations around this industry has yet to come. In this talk, I will discuss how many technologies, such as search systems, recommender systems as well as advertising systems that the industry heavily relies on are deeply rooted in their more generic counterparts, which may not address unique challenges in this industry and therefore hindering possibly better products serving both job seekers and recruiters. In addition, observations and evidences indicate that users, including job seekers, recruiters, as well as advertisers, have different behaviors on these novel two-sided marketplaces, calling for a better systematic understanding of users in this new domain.
11/11/2021: 05:30 pm Olivier Jeunen (UAntwerpen) Advances in Bandit Learning for Recommendation: Pessimistic Reward Models & Equity of Exposure The “bandit learning” paradigm is an attractive choice to recommendation practitioners, because it allows us to optimise a model directly for the outcomes driven by our recommendations. Nevertheless, because we do not observe the outcomes of actions the system did not take, learning in such a setting is not straightforward. In the off-policy setting, where we learn from a fixed dataset of logged recommendations and their outcomes, selection bias of this form is prone to lead to problems of severe over-estimation. The first part of this talk will introduce Pessimistic Reward Models, and show how they can alleviate this bias, leading to significantly improved recommendation performance. In the on-policy setting, we learn from the outcomes of our own actions, and several provably optimal algorithms exist. But is blindly optimising a model for some user-focused notion of reward always what we want? The second part of this talk will discuss the “Equity of Exposure” principle in relation to Top-K recommendation problems, introducing an Exposure-Aware Arm Selection algorithm that can significantly improve fairness of exposure with a minimal impact on reward.
18/11/2021 Fernando Diaz (Google) Evaluating Stochastic Rankings with Expected Exposure We introduce the concept of expected exposure as the average attention ranked items receive from users over repeated samples of the same query. Furthermore, we advocate for the adoption of the principle of equal expected exposure: given a fixed information need, no item should receive more or less expected exposure than any other item of the same relevance grade. We argue that this principle is desirable for many retrieval objectives and scenarios, including topical diversity and fair ranking. Leveraging user models from existing retrieval metrics, we propose a general evaluation methodology based on expected exposure and draw connections to related metrics in information retrieval evaluation. Importantly, this methodology relaxes classic information retrieval assumptions, allowing a system, in response to a query, to produce a distribution over rankings instead of a single fixed ranking. We study the behavior of the expected exposure metric and stochastic rankers across a variety of information access conditions, including ad hoc retrieval and recommendation. We believe that measuring and optimizing expected exposure metrics using randomization opens a new area for retrieval algorithm development and progress.
25/11/2021 Michael Twidale (University of Illinois) Spectacles, training wheels, crutches, Pandora’s boxes, or Frankenstein’s monsters: thinking about metaphors in the design and use of novel technologies Metaphors have been much discussed in the field of human computer interaction. Researchers have debated whether designing metaphors into interfaces are helpful, empowering, confusing, or limiting. Building on a preliminary analysis of metaphor use in Voice User Interfaces, I want to reflect on the implications of the metaphors that we design in, and those that people seem to use. Interface metaphors are sometimes seen as a crutch for less technical people to get to grips with a complex database, or a set of training wheels to help people learn effectively and build confidence. But cognitive linguists remind us that we all use metaphors all the time. What happens if we hold the metaphor mirror up to our own faces as designers, developers and researchers? What are the metaphors that we use, often without even noticing? Which alternative metaphors might we try on like a new hat that could help us get our thinking out of a rut? Can different metaphorical spectacles help us see an intractable problem in a different way? Do our metaphorical blinders prevent us from noticing opportunities - or the ways that designed systems, structures and metrics may be nudging us with various carrots and sticks? Can metaphor analysis shine a light on why we do what we do and help us notice a path less travelled that might lead to buried intellectual treasure?

October
Date Speaker Title Abstract
07/10/2021 Chenglong Ma (RMIT) Collaborative Filtering Meets Black Swan Events: A Study by Modelling User Needs Evolution The COVID-19 pandemic, as a typical “Black Swan” event, has caused a profound impact on the health and lifestyle of people all over the world. To cope with the sudden occurrence of such black-swan events, people have to change their ingrained behaviours to adapt to the turbulent and changeable living environment. Consequently, this choppy change has gravely affected Collaborative Filtering (CF) models that generate personalised recommendations based on people’s behaviour data. One of the conventional assumptions behind such systems is that user preferences are traceable from their historical data and not easy to change in a short time. However, the models, which are trained on existing “usual” behaviour, would be baffled by the “unusual” changes due to black swan events, and thus their performance is affected. This paper investigates what would happen when Collaborative Filtering meets Black Swan events and how this affects CF models. To achieve this, we propose an evaluation framework based on user dynamic interactions by modelling user needs evolution. The framework is capable of characterising the shift of the user preference and providing a global lens to understand the influence of black swan events on CF models. The experiment results show there is a significant herding effect during the black swan event, which introduces popularity bias.
14/10/2021 Mark Sanderson (RMIT) Are significance tests weaker than we previously thought? Note, while I gave a version of this talk before, the work has moved on since then. I will describe a recently proposed ANOVA model and compare it to two well known baseline tests. I apply the analysis to both the runs of a whole TREC track, the typical approach, and also to the runs submitted by six participant groups. The former reveals test behavior in the heterogeneous settings of a large-scale evaluation initiative; the latter, almost overlooked in past work (to the best of our knowledge), reveals what happens in the much more restricted case of variants of a single system, i.e. the typical context in which companies and research groups operate. I will describe how the novel test is strikingly consistent in large-scale settings, but worryingly inconsistent in some participant experiments. Of greater concern, the participant only experiments show one of our baseline tests (widely used in research) can produce a substantial number of inconsistent test results.
21/10/2021 The School of Engineering and Computing Technologies ECT Milestone Conference RMIT
28/10/2021 Justin Munoz (RMIT) The application of machine learning for supporting financial systems Research in recommender and decision support systems spans many domains and continues to grow, however there are a few fields that suffer from slower progress. Little research has investigated the development of these systems in the financial services domain. Recommending financial products, or services, to customers is a difficult task. Small item sets, the infrequence of purchase activity and the absence of explicit feedback inhibit the ability to use traditional recommender algorithms. Furthermore, decision support systems need to be carefully designed to ensure that these systems do not cause errors in financial reporting which are critical for auditing purposes. In this research, we focus on the application of machine learning techniques for supporting systems in the financial services domain. Firstly, we introduce a bi-level approach to handle the complex nature of recommending loan products. Our experimental results highlight the importance of considering loan approval of borrowers when building a loan prospecting model. Secondly, we adopt a novel approach in improving autonomous bookkeeping tasks such as invoice code suggestion and bank reconciliation. Our research explores the use of state-of-the-art natural language processing techniques and hierarchical multi-label neural network architectures to enhance past efforts.

September
Date Speaker Title Abstract
02/09/2021 RMIT SLOW DOWN WEEK
09/09/2021 Chen Zhao (UMD) Towards more practical complex question answering Question answering is one of the most important and challenging tasks for understanding human language. With the help of large-scale benchmarks, state-of-the-art neural methods have made significant progress to even answer complex questions that require multiple evidence pieces. Nevertheless, training existing SOTA models requires several assumptions (e.g., intermediate evidence annotation, corpus semi-structure) that limit the applicability to only academic testbeds. In this talk, I discuss several solutions to make current QA systems more practical. I first describe a state-of-the-art system for complex QA with an extra hop attention in its layers to aggregate different pieces of evidence following the structure. Then I introduce a dense retrieval approach that iteratively forms an evidence chain through beam search in dense representations, without using semi-structured information. Finally, I describe a dense retrieval work that focuses on a weakly-supervised setting, by learning to find evidence from a large corpus, and relying only on distant supervision for model training.
10/09/2021 Aritra Mandal (eBay) Using AI to Understand Search Intent In this talk, I will explore how AI can help understand query intent in the context of ecommerce search. We will focus on two specific aspects of this problem: query categorization and query similarity. Specifically, I will describe how to train a query categorization model from engagement data and how to recognize equivalent queries using embeddings trained from search behavior.
16/09/2021 Sachin Pathiyan Cherumanal (RMIT) Evaluating Fairness in Argument Retrieval Existing commercial search engines often struggle to represent different perspectives of a search query. Argument retrieval systems address this limitation of search engines and provide both positive (PRO) and negative (CON) perspectives about a user’s information need on a controversial topic (e.g., climate change). The effective-ness of such argument retrieval systems is typically evaluated based on topical relevance and argument quality, without taking into account the often differing number of documents shown for the argument stances (PRO or CON). Therefore, systems may retrieve relevant passages, but with a biased exposure of arguments. In this work, we analyze a range of non-stochastic fairness-aware ranking and diversity metrics to evaluate the extent to which argument stances are fairly exposed in argument retrieval systems. Using the official runs of the argument retrieval task Touché at CLEF 2020, as well as synthetic data to control the amount and order of argument stances in the rankings, we show that systems with the best effectiveness in terms of topical relevance are not necessarily the most fair or the most diverse in terms of argument stance. The relationships we found between (un)fairness and diversity metrics shed light on how to evaluate group fairness – in addition to topical relevance – in argument retrieval settings.
23/09/2021 Laurianne Sitbon (QUT) Towards Inclusive Interactions in Information Retrieval The World Wide Web of 2021 is a platform for sharing, socialising and synthesizing vast amounts of information. However, for the approximately 3% of the population with intellectual disability, access remains limited. Most people with intellectual disability (ID) have reduced abilities to digest new or complex information, requiring specific accessible design. Yet they often do not fit a neatly labeled diagnostic category, often having a combination of underlying cognitive, communicative, motor and sensory conditions. In this talk, Laurianne will present what she and her team learnt through 5 years of fieldwork, co-designing interactive information access systems with adults with intellectual disability. She will demonstrate with examples how iterative approaches that centre on people’s competencies, and recognise support networks as part of key competencies, can ensure future designs are both inclusive and respectful of individuals with intellectual disability. She will discuss opportunities for continuing research to discover how recommender systems can better support visual and multimodal interactions in people’s own terms.
30/09/2021 Dana Mcay (RMIT) and Stephann Makri (CUL) We are the change that we seek: Information Interaction during a change of view While there has been much discussion about fake news, social media, filter bubbles, and echo chambers, little research has examined the role of information in changing views from an individual’s viewpoint. Rather than assuming that everyone’s mind is made up by social media, without any agency, shouldn’t we be questioning their experience? How do they find or encounter information that changes views, and what is the role of information in that process? This talk will answer just these questions. We will report on a qualitative study of the view change experiences of 18 adults in the UK, focusing on the role of information in those view changes. We will address the new understanding on view change and information interaction generated by these findings and point to avenues for future research.

August
Date Speaker Title Abstract
05/08/2021 Manuel Steiner (RMIT) Untangling the Concept of Task in Information Seeking and Retrieval, ICTIR 2021 (paper reading) The paper discusses tasks in Information Seeking and Retrieval. Researchers of past works often use different terminology to describe a concept or some terms to refer to different concepts when discussing tasks in ISR. The authors of this paper first provide an overview of previous literature. They highlight commonalities and differences between task hierarchies and surrounding concepts. They note that work roles are commonly absent from previous works. The paper then presents an integrated task taxonomy based on existing literature, consolidating concepts into a single model, which also includes work roles, based on the review of literature in work structure, management, and human resources.
12/08/2021
19/08/2021 Vincent Li (Coles) Challenges and Learnings in Product Search Product search is a specific type of search engine adopted by online businesses to help customers find relevant products. E-commerce search, job search, or hotel search are different applications of product search. Product search problems are different from conventional web search engines in many ways. For example, in product search, the documents are usually less descriptive and more structured, which makes classical retrieval models less effective. The search intents for product search engines are usually more than finding relevant documents and the definition of relevance has multiple dimensions rather than just topical relevance. Moreover, recall and precision are both important for product search, while the ranking of the products is critical for driving the business outcomes. All these differences pose new challenges for solving the product search problems. In this talk, I will discuss the existing challenges of product search problems and share some learnings working with product search applications.
26/08/2021

July
Date Speaker Title Abstract
01/07/2021 Danula Hettiachchi (ADMS) The Challenge of Variable Effort Crowdsourcing We consider a class of variable effort human annotation tasks in which the number of labels required per item can greatly vary (e.g., finding all faces in an image, named entities in a text, bird calls in an audio recording, etc.). In such tasks, some items require far more effort than others to annotate. Furthermore, the per-item annotation effort is not known until after each item is annotated since determining the number of labels required is an implicit part of the annotation task itself. On an image bounding-box task with crowdsourced annotators, we show that annotator accuracy and recall consistently drop as effort increases. We hypothesize reasons for this drop and investigate a set of approaches to counteract it. Firstly, we benchmark on this task a set of general best-practice methods for quality crowdsourcing. Notably, only one of these methods actually improves quality: the use of visible gold questions that provide periodic feedback to workers on their accuracy as they work. Given these promising results, we then investigate and evaluate variants of the visible gold approach, yielding further improvement. Final results show a 7% improvement in bounding-box accuracy over the baseline. We discuss the generality of the visible gold approach and promising directions for future research.
08/07/2021 Xi Wang (University of Glassgow) Personalised Usefulness of Reviews for Effective Recommendation Recent review-based recommenders have shown promising results in improving the recommendation performance on various public datasets. However, to make effective recommendations, it is both vital and challenging to accurately measure the usefulness of reviews. In particular, according to the literature, users have been shown to exhibit distinct preferences over different types of reviews (e.g. preferring longer vs. shorter or recent vs. old reviews). Yet, there have been limited studies that account for the personalised usefulness of reviews when estimating the users' preferences. This talk presents two consecutive studies in addressing such a research gap in the literature: 1) NCWS, a weakly supervised binary review helpfulness classifier; 2) RPRS, an end-to-end recommendation model that estimates users preferences first over the reviews exhibiting various properties and then over the items of interest.
15/07/2021 SIGIR Conference
22/07/2021
29/07/2021

June
Date Speaker Title Abstract
03/06/2021 Pablo Castells (UAM) Rational and irrational bias in recommendation (17:00 GMT+10) Concern for bias in IR has grown considerably in the last few years, and recommender systems are a particular area where applications and experiments are immersed in bias. I will discuss some general angles on the biases that pervade recommendation, and what researchers are doing about it. I will follow up into some of my particular experience and findings in better understanding the effect of popularity biases on the effectiveness of recommendation and our capability to properly measure it in offline evaluation. I will discuss how the positive or negative effect of popularity in recommendation relates to the role of relevance in users' discoveries and choices, and in the formation of majorities, sometimes in non-obvious ways.
10/06/2021 Supervisors meeting
17/06/2021 Nick Craswell (Microsoft) Comparing and deploying deep learning models for Web Search In recent years, our idea of what rankers work best in Web search has completely changed. We used to believe that boosted tree models with hand-crafted features were the best approach, but new approaches, particularly with the current BERT-style rankers, seem significantly better. How do we convince ourselves that these models are really better? What’s the process we can use in a product to deploy improvements to ML rankers? To measure them in production? How do we deal with competing goals: Clicks, positive relevance labels, higher DAU/retention? I will describe some of our public-facing research in the area, since I can talk about that freely, but also link it to how things work in development and measurement of ML models in Bing (without revealing product details).
24/06/2021 The School of Engineering and Computing Technologies ECT Milestone Conference

May
Date Speaker Title Abstract
06/05/2021 Yongxin Xu (Monash University) Internet searching and stock price crash risk: Evidence from a quasi-natural experiment In 2010, Google unexpectedly withdrew its searching business from China, reducing investors’ ability to find information online. The stock price crash risk for firms searched for more via Google before its withdrawal subsequently increases by 19%, suggesting that Internet searching facilitates investors’ information processing. The sensitivity of stock returns to negative Internet posts also rises by 36%. The increase in crash risk is more pronounced when firms are more likely to hide adverse information and when information intermediaries are less effective in assisting investors’ information processing. In addition, liquidity (price delay) decreases (increases) after Google's withdrawal.
13/05/2021 Oleg Zendel (RMIT) An Enhanced Evaluation Framework for Query Performance Prediction Query Performance Prediction (QPP) has been studied extensively in the IR community over the last two decades. A by-product of this research is a methodology to evaluate the effectiveness of QPP techniques. In this paper, we re-examine the existing evaluation methodology commonly used for QPP, and propose a new approach. Our key idea is to model QPP performance as a distribution instead of relying on point estimates. Our work demonstrates important statistical implications, and overcomes key limitations imposed by the currently used correlation-based point-estimate evaluation approaches. We also explore the potential benefits of using multiple query formulations and ANalysis Of VAriance (ANOVA) modelling in order to measure interactions between multiple factors. The resulting statistical analysis combined with a novel evaluation framework demonstrates the merits of modelling QPP performance as distributions, and enables detailed statistical ANOVA models for comparative analyses to be created.
20/04/2021 Paul Thomas (Microsoft) Do affective cues validate behavioural metrics for search? Traces of searcher behaviour, such as query reformulation or clicks, are commonly used to evaluate a running search engine. The underlying expectation is that these behaviours are proxies for something more important, such as relevance, utility, or satisfaction. Affective computing technology gives us the tools to help confirm some of these expectations, by examining visceral expressive responses during search sessions. However, work to date has only studied small populations in laboratory settings and with a limited number of contrived search tasks. In this study, we analysed longitudinal, in-situ, search behaviours of 152 information workers, over the course of several weeks while simultaneously tracking their facial expressions. Results from over 20,000 search sessions and 45,000 queries allow us to observe that indeed affective expressions are consistent with, and complementary to, existing “click-based” metrics. On a query level, searches that result in a short dwell time are associated with a decrease in smiles (expressions of “happiness”) and that if a query is reformulated the results of the reformulation are associated with an increase in smiling – suggesting a positive outcome as people converge on the information they need. On a session level, sessions that feature reformulations are more commonly associated with fewer smiles and more furrowed brows (expressions of “anger/frustration”). Similarly, sessions with short-dwell clicks are also associated with fewer smiles. These data provide an insight into visceral aspects of search experience and present a new dimension for evaluating engine performance. (This is work with Daniel McDuff, Nick Craswell, Kael Rowan, and Mary Czerwinski, all at Microsoft.)
27/05/2021 Romy Menghao Jia (UniSA) LGBTQ+ individuals’ information seeking and sharing in an online community Individuals with LGBTQ+ identities are at risk of negative outcomes due to their highly stigmatized sexual and gender minority status. Online communities are especially important to this population group, as they provide a safe and validating space for building interpersonal connections and fostering a sense of belonging. Focused on the affective and motivational aspects of LGBTQ+ individuals’ information behaviour, our work investigates the emotions LGBTQ+ individuals expressed, and their relatedness needs that motivated them to seek and share information in an online community. Through a deductive thematic analysis of 156 posts to an LGBTQ+ online forum, our analysis reveals three main categories of relatedness needs: being cared about, caring for others, as well as building and maintaining relationships. Sixty-one posts that contained emotional texts were further coded to analyse the emotions expressed by LGBTQ+ individuals. Seven categories of emotions emerged from the analysis: fear, uncertainty, sadness, anger, shame, joy, and others. Our work contributes to the existing knowledge of how LGBTQ+ individuals cope with various challenges and how online communities can better support sexual and gender minority people. Future work will investigate the influence of their information behaviour on their resilience and affordances implications for LGBTQ+ online community design.

April
Date Speaker Title Abstract
01/04/2021 Leila Tavakoli (RMIT) Analysing Clarification in Asynchronous Information Seeking Conversations This research analyses human-generated clarification questions to provide insights into how they are used to disambiguate and provide a better understanding of information needs. A set of clarification questions is extracted from posts on the Stack Exchange platform. Novel taxonomy is defined for the annotation of the questions and their responses. Our results indicate that questions answered by the person who submitted the original post (the asker) are more likely to be informative and to increase the chance of getting an accepted answer. After identifying which clarification questions are more useful, we investigate the characteristics of these questions in terms of their types and patterns. Non-useful clarification questions are identified, and their patterns are compared with useful clarifications. Our analysis indicates that the most useful clarification questions have similar patterns, regardless of topic. This research contributes to an understanding of clarification in conversations and can provide insight for clarification dialogues in conversational search scenarios and for the possible system generation of clarification requests in information seeking conversations.
08/04/2021 Valeriia Baranova (RMIT) A taxonomy of non-factoid questions Non-factoid question answering (NFQA) is a challenging task where a system should return complex long-form answers (such as explanations or opinions) to open-ended questions. Although researchers are working on datasets and systems in this area, so far the performance of the latter falls far behind systems created to answer factoid questions. The reason could be that the form of answers that an ideal NFQA system should return varies greatly depending on the category of asked question. In our study, we propose the first comprehensive non-factoid question taxonomy constructed by employing grounded theory and extensively evaluated via several crowdsourcing studies. Along with taxonomy, we provide a dataset of non-factoid question categories and a performant model for question category prediction. Both will be made publicly available to the research community. Finally, we conducted an analysis of non-factoid question category distribution in various existing QA and conversational datasets, showing that the most challenging question categories for the existing NFQA systems are poorly represented in these datasets. We believe that a better understanding of question categories of non-factoid questions and the structure of target answers will greatly aid the research in this area.
15/04/2021 Shohreh Deldari (RMIT) Time Series Change Point Detection with Self-Supervised Contrastive Predictive Coding Change Point Detection (CPD) methods identify the times associated with changes in the trends and properties of time series data in order to describe the underlying behaviour of the system. For instance, detecting the changes and anomalies associated with web service usage, application usage or human behaviour can provide valuable insights for downstream modelling tasks. We propose a novel approach for self-supervised Time Series Change Point detection method based on Contrastive Predictive coding (TS-CP^2). TS-CP^2 is the first approach to employ a contrastive learning strategy for CPD by learning an embedded representation that separates pairs of embeddings of time adjacent intervals from pairs of interval embeddings separated across time. Through extensive experiments on three diverse, widely used time series datasets, we demonstrate that our method outperforms five state-of-the-art CPD methods, which include unsupervised and semi-supervised approaches. TS-CP^2 is shown to improve the performance of methods that use either handcrafted statistical or temporal features by 79.4% and deep learning-based methods by 17.0% with respect to the F1-score averaged across the three datasets.
22/04/2021 Mark Sanderson (RMIT) How Do You Test a Test? Examining Significance from Different Angles In this talk, I will describe a suite of measures, which are jointly used to investigate a recently proposed ANOVA model based significance test and to compare it to two well-known baselines. I will apply the measures to the both the runs of a TREC track, and to the runs submitted by single participants. The former reveals test behaviour in the heterogeneous settings of a large-scale evaluation initiative, the latter lets us know what happens in the much more restricted case of variants of a single system. The results of our study show the novel ANOVA model to be substantially better than a commonly used significance test found in many IR researcher papers.
29/04/2021

March
Date Speaker Title Abstract
03/03/2021 Rosie Jones (Spotify) Research on Podcasts at Spotify Podcasts are a large and growing repository of spoken audio. As an audio format, podcasts are more varied in style and production type than broadcast news, contain more genres than typically studied in video data, and are more varied in style and format than previous corpora of conversations. When transcribed with automatic speech recognition they represent a noisy but fascinating collection of documents which can be studied through the lens of natural language processing, information retrieval, and linguistics. Paired with the audio files, they are also a resource for speech processing and the study of acoustic aspects of the domain. We introduce the Spotify Podcast Dataset, a new corpus of 100,000 podcasts. We demonstrate the complexity of the domain with a case study of two tasks drawn from information retrieval and NLP: (1) passage search and (2) summarization. This data is orders of magnitude larger than previous speech corpora used for search and summarization. Our results show that the size and variability of this corpus opens up new avenues for research. I’ll discuss some of those avenues, as well as giving a brief overview of research on podcasts and music at Spotify.
11/03/2021
18/03/2021 CHIIR Conference
25/03/2021 Yuta Saito (Tokyo Institute of Technology) Towards Realistic and Reproducible Off-Policy Evaluation: Open-Source Dataset, Software, and Application in Fashion E-Commerce Recommendation There is a growing interest in off-policy evaluation (OPE) or offline evaluation in web search and data mining; we have witnessed great research progress over the past decade. There is, however, a critical issue in the current OPE research; all existing experiments are either unrealistic or irreproducible, creating the gap between theory and practice. To break this gap and push forward OPE to a tangible method, we are running an open-source research project called the Open Bandit Project. The project includes Open Bandit Dataset and Open Bandit Pipeline. The Open Bandit Dataset is a real-world public dataset collected on the ZOZOTOWN platform, the largest fashion e-commerce in Japan. The dataset is unique; it contains two sets of log data collected by running multiple different recommendation policies, enabling fair comparisons of different OPE methods for the first time. We also implement a Python software, Open Bandit Pipeline, to streamline and standardize the implementation of OPE both in research and practice. In this talk, I will share the basic formulation and discuss current issues in OPE research. Then, I will introduce our open-source project and show a proof of concept live demonstration about OPE with our software. Finally, I will talk about some real-world applications in the ZOZOTOWN recommendation interface.

January and February was a restructuring period in 2021