dc.contributor.advisor O'Riordan, Colm dc.contributor.advisor Pasi, Gabriella dc.contributor.author Qureshi, Muhammad Atif dc.date.accessioned 2015-10-12T14:21:12Z dc.date.available 2015-10-12T14:21:12Z dc.date.issued 2015-10-08 dc.identifier.uri http://hdl.handle.net/10379/5304 dc.description.abstract The process whereby inferences are made from textual data is broadly referred to as text mining. In order to ensure the quality and effectiveness of the derived inferences, several approaches have been proposed for different text mining applications. Among these applications, classifying a piece of text into pre-defined classes through the utilisation of training data falls into supervised approaches while arranging related documents or terms into clusters falls into unsupervised approaches. In both these approaches, processing is undertaken at the level of documents to make sense of text within those documents. Recent research efforts have begun exploring the role of knowledge bases in solving the various problems that arise in the domain of text mining. Of all the knowledge bases, Wikipedia on account of being one of the largest human-curated, online encyclopaedia has proven to be one of the most valuable resources in dealing with various problems in the domain of text mining. However, previous Wikipedia-based research efforts have not taken both Wikipedia categories and Wikipedia articles together as a source of information. en_US This thesis serves as a first step in eliminating this gap and throughout the contributions made in this thesis, we have shown the effectiveness of Wikipedia category-article structure for various text mining tasks. Wikipedia categories are organized in a taxonomical manner serving as semantic tags for Wikipedia articles and this provides a strong abstraction and expressive mode of knowledge representation. In this thesis, we explore the effectiveness of this mode of Wikipedia's expression (i.e., the category-article structure) via its application in the domains of text classification, subjectivity analysis (via a notion of perspective" in news search), and keyword extraction. First, we show the effectiveness of exploiting Wikipedia for two classification tasks i.e., 1- classifying the tweets being relevant/irrelevant to an entity or brand, 2- classifying the tweets into different topical dimensions such as tweets related with workplace, innovation, etc. To do so, we define the notion of \textit{relatedness} between the text in tweet and the information embedded within the Wikipedia category-article structure. Then, we present an application in the area of news search by using the same notion of \textit{relatedness} to show more information related to each search result highlighting the amount \textit{perspective} or subjective bias in each returned result towards a certain opinion, topical drift, etc. Finally, we present a keyword extraction strategy using community detection over the Wikipedia categories to discover related keywords arranged in different communities. The relationship between Wikipedia categories and articles is explored via a textual phrase matching framework whereby the starting point is textual phrases that match Wikipedia articles' titles/redirects. The Wikipedia articles for which a match occurs are then utilised by extraction of their associated categories, and these Wikipedia categories are used to derive various structural measures such as those relating to taxonomical depth and Wikipedia articles they contain. These measures are utilised in our proposed text classification, subjectivity analysis, and keyword extraction framework and the performance is analysed via extensive experimental evaluations. These experimental evaluations undertake comparisons with standard text mining approaches in the literature and our Wikipedia framework based on its category-article structure outperforms the standard text mining techniques. dc.subject Wikipedia en_US dc.subject Category-article structure en_US dc.subject Text mining en_US dc.subject Information technology en_US dc.subject Engineering & Informatics en_US dc.title Utilising Wikipedia for text mining applications en_US dc.type Thesis en_US dc.local.note Making decisions from textual data is a complex undertaking requiring a manual labor. The processing ability of computers has made this task easy giving birth to the research field known as text mining. The work in this thesis aims towards improvement of text mining by utilising knowledge in Wikipedia. en_US dc.local.final Yes en_US nui.item.downloads 1495
﻿

### Files in this item

This item is available under the Attribution-NonCommercial-NoDerivs 3.0 Ireland. No item may be reproduced for commercial purposes. Please refer to the publisher's URL where this is made available, or to notes contained in the item itself. Other terms may apply.

The following license files are associated with this item: