Suggestion mining from text
MetadataShow full item record
This item's downloads: 400 (view details)
With the ever growing availability of opinions on the web, opinion mining has become a popular area of research in the fields of natural language processing and applied machine learning. We argue that opinion summaries based solely on the sentiment polarity of text towards an entity of interest, excludes explicit recognition and extraction of information appearing in the form of recommendations, tips, and advice. Suggestions and advice are often sought by different stakeholders through surveys and suggestion forms, where readers are explicitly asked to provide suggestions. On the other hand, customers tend to spontaneously mention suggestions for new features in the product reviews or tweets about the product, and similarly recommendations for nearby places to eat are often spotted in the hotel reviews. Yet sentiment analysis remains the most popular opinion mining task performed on these data sources. In this thesis, we investigate the automatic extraction of suggestions from text, which is referred to as Suggestion Mining. Suggestion Mining is framed as a sentence classification task, where sentences in a given text are to be automatically labeled as suggestions and non-suggestions. Given the very limited amount of related work, suggestion mining can be considered as a young research problem. Therefore, research questions investigated in this dissertation address some of the core aspects of suggestion mining. This includes, task definition where the scope of suggestion and non-suggestion classes is formally defined, benchmark datasets are developed, manually identified features for supervised learning methods, as well as representation learning are evaluated, and distant supervision approaches under the lack of domain specific training datasets are introduced. While covering these aspects, this thesis primarily revolves around two computational tasks, sentence classification and representation learning. The thesis adopts some of the popular deep learning concepts and methods for suggestion mining, like Word Embeddings and Long Short Term Memory Networks. This also opens up a young sentence classification task, and corresponding benchmark datasets to the deep learning community. The contributions of this thesis are manifold. A formal task definition for suggestion mining is provided, which accompanies qualitative and quantitative analysis of suggestions and a formal definition of suggestions in the context of suggestion mining. Benchmark datasets from multiple domains are developed and released, accompanied by a robust data annotation methodology which balances the cost and quality of manual annotations. An in-depth evaluation of the method of using manually selected features with Support Vector Machine classifiers is performed in domain specific, domain independent, and cross domain training scenarios. It is demonstrated that a combination of features from the related work and our newly proposed features outperform the models which use either of them. It is also discovered that the syntactic features consistently remain the top performing features in all the experiments. A major contribution of the thesis is creation of a large silver standard dataset composed of sentences from Wikihow and Wikipedia, and validation of a method to use this dataset for automatically learning features, i.e., representation learning. Experiments comparing manually selected features with automatic feature learning, i.e. word embeddings prove that the embeddings which represent part of speech tags outperform the state of the art pre-trained word embeddings for this task. This thesis performs an end to end exploration of suggestion mining, with all evaluations performed for domain specific, open domain and cross domain classification scenarios.
This item is available under the Attribution-NonCommercial-NoDerivs 3.0 Ireland. No item may be reproduced for commercial purposes. Please refer to the publisher's URL where this is made available, or to notes contained in the item itself. Other terms may apply.
The following license files are associated with this item: