Show simple item record

dc.contributor.advisorDuggan, Jim
dc.contributor.advisorRebholz-Schuhmann, Dietrich
dc.contributor.advisorBuitelaar, Paul
dc.contributor.authorMarques Barros, Joana Carina
dc.date.accessioned2021-01-13T14:22:55Z
dc.date.available2021-01-13T14:22:55Z
dc.date.issued2021-01-13
dc.identifier.urihttp://hdl.handle.net/10379/16449
dc.description.abstractThe introduction of digital data sources has positively impacted public health surveillance and has paved the way for novel approaches. Internet-based sources provide large volumes of data which can be analysed in near real-time and directly address limitations of traditional sources for disease surveillance. For example, the timeliness of these sources can be applied for infectious disease outbreak detection while the descriptive content provides informal health reports useful to monitor noncommunicable illnesses such as diabetes. In this context, this thesis aims to provide a deeper understanding of how Internet-based sources are, and can be used for public health monitoring. Hence, it encompasses the discipline of health informatics and applies infodemiology science and digital health with the potential applications in infoveillance. Our focus is on three sources of informal health reports: microblogs (Twitter), discussion forums (Reddit), and search queries (Google Trends). The reasoning behind this choice is threefold: 1) Twitter and Reddit are a type of social media; hence they directly capture the users’ input; 2) the nature of these social media sources is complementary; while Twitter is utilised to share spontaneous thoughts, reports from Reddit tend to be lengthy and more contextualised; 3) Google search queries offer a channel to study the search behaviour of potential patients, worldwide. With Twitter, we targeted its potential use for global disease monitoring of infectious and noncommunicable illnesses while also exploring the effect of disease transmission hot-spots. Our findings showed that Twitter is not suitable for global disease monitoring, suffering from low recall when applying standard terminological resources and that transmission hot-spots do not present an increased mention of diseases. Reddit forum posts encourage discussion and facilitate the exchange of information; hence, the contextual richness of these sources is superior to that of short and mostly isolated messages on Twitter. Given this, the research focused on the discovery of the capabilities for the classification of disease mentions. Using contextualised representation models and an hierarchical neural text classification architecture, we achieved F1-scores of 0.992 and 0.674 for the classification of 6 infectious and 17 non-infectious diseases. For Google Trends, we developed a suicide occurrences forecasting system for the Republic of Ireland, where search volumes are utilised in parallel with official suicide statistics. Besides, we further explored the ability to generalise relevant search queries for suicide occurrence prediction in a distinct country with a shared language. Utilising a neural autoregression model, we achieved a mean absolute error of 4.14 for Ireland when utilising the search query feeling down, and 6.09 for the United Kingdom when using 34 search queries and unemployment data. The contributions of this thesis are four-fold: first, a comprehensive systematic literature review with a focus on internet-based sources and their limitations, diseases targeted, and standard methods for disease surveillance; second, Twitter’s capacity in providing health-related reports from disease transmission hot-spots for infectious and noncommunicable illnesses; third, the potential of Reddit for informal health report classification; fourth, improvement of the prediction of suicide occurrences in Ireland, and that search queries selected from Ireland also positively contribute to the modelling of suicide occurrences in the United Kingdom.en_IE
dc.publisherNUI Galway
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 Ireland
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/3.0/ie/
dc.subjectPublic Healthen_IE
dc.subjectDisease Surveillanceen_IE
dc.subjectSocial Mediaen_IE
dc.subjectTwitteren_IE
dc.subjectRedditen_IE
dc.subjectGoogle Trendsen_IE
dc.subjectInfodemiologyen_IE
dc.subjectNatural Language Processingen_IE
dc.subjectNoncommunicable Diseasesen_IE
dc.subjectInfectious Diseasesen_IE
dc.subjectOutbreak Detectionen_IE
dc.subjectTime-Seriesen_IE
dc.subjectAutoregressive Modelsen_IE
dc.subjectNeural Autoregression Modelen_IE
dc.subjectClusteringen_IE
dc.subjectNeural Networksen_IE
dc.subjectText Classificationen_IE
dc.subjectMachine Learningen_IE
dc.subjectEngineering and Informaticsen_IE
dc.subjectScience and Engineeringen_IE
dc.subjectInformation technologyen_IE
dc.titleApplying informal health reports and search queries for public health monitoring: An evaluation of characteristics, potentials, and requirements of online self-reporting, discussions, and search behaviouren_IE
dc.typeThesisen
dc.contributor.funderScience Foundation Irelanden_IE
dc.contributor.funderEuropean Regional Development Funden_IE
dc.local.noteThis thesis provides a deeper understanding of how Internet-based sources are, and can be used for public health monitoring.  Leveraging natural language processing and machine learning techniques, this work evaluates the characteristics, potentials, and requirements of informal health reports from microblogs (Twitter), discussion forums (Reddit), and search queries (Google Trends).en_IE
dc.local.finalYesen_IE
nui.item.downloads480


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 Ireland
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Ireland