Utilizing passage-based evidences in information retrieval tasks
View/ Open
Date
2023-03-09Author
Sarwar, Ghulam
Metadata
Show full item recordUsage
This item's downloads: 65 (view details)
Abstract
In the age of Information overload accessing information is a few clicks away from us. This
wealth of information at hand can certainly have its advantages. However, for a search
engine, identifying the relevant information from that Big Data is still a challenging task.
Particularly, finding pertinent information from lengthy documents is tricky due to the natural
language nature of searched queries and the topical diversity of the documents. Rather than
considering the document as a whole, one viable method is to measure the pertinence of
a query to the concise units (passages) in the given document and utilize that measuring
process for evaluating the query-document relevance. This thesis aims to utilize these smaller
units of a document known as passages in different information retrieval tasks.
A passage is defined as a sequence of sentences or words that start and end at any place
within a given document. Passage retrieval deals with identifying and retrieving small but
explanatory portions of a document that answers a user’s query. In this thesis, we first present
a novel approach to improving the document ranking by using different passage-based
evidence. We evaluated our approach with the existing passage retrieval methods and more
in-depth analysis was undertaken into the effect of varying specific. We have also explored
the notion of query difficulty to understand whether the best performing passage-based
approach helps to improve, or not, the performance of certain queries.
Secondly, we presented a novel graph approach that utilizes the similarity of passages
within their parent document to form a cohesion structure. We discussed that the relevant
documents tend to be more cohesive than the non-relevant documents. Furthermore, we also
re-ranked the documents by applying the cohesion score with a document similarity score to
inspect its impact on the system’s performance.
Moreover, we carried out experiments by using different sliding windows around words
in each passage to determine the context and semantic relatedness. We then compared the
state of the art pseudo relevance feedback (PRF) technique with our proposed passage-based
sliding window approach for query expansion. The usage of top-ranked passages for query
expansion was motivated due to the reason that relevant passages for query expansion would
remove elements of noise found in a text document that contains a number of topics. We
extended our approach by including a popular word embedding (WE) approach i.e the
word2vec and have demonstrated that the passage-based PRF and WE approach outperforms
their document-based equivalent.
Lastly, we utilize the passage answer-set for each query as a graph and applied different
graph-based measures to identify a correlation between the relevance of a document and
those calculated graph measures. Our approach was inspired by the cluster hypothesis which
states that similar entities are more likely to be closer to each other. We also discussed an
application of our answer-set graph approach for the Query Performance Prediction tasks and
a future avenue to apply it for the topic visualization. We have shown that our passage-based
graph features outperforms the existing state of the art QPP approaches and generate a
positive correlation in determining the easy and the hard queries.