Show simple item record

dc.contributor.advisorCurry, Edward
dc.contributor.authorPavlopoulou, Niki
dc.date.accessioned2022-03-08T12:19:35Z
dc.date.available2022-03-08T12:19:35Z
dc.date.issued2021-12-21
dc.identifier.urihttp://hdl.handle.net/10379/17032
dc.description.abstractThe Internet of Things (IoT) has contributed to physical devices generating entity-centric data (e.g. smart buildings). To bridge the gap between the devices’ data and the users’ interests, Publish/Subscribe systems (Pub/Sub) are suitable middleware to deal with dynamic large-scale IoT applications due to their decoupling traits. However, the IoT contains more challenges than dynamism related to data and users. Specifically, data can be voluminous and heterogeneous due to integration or enrichment as well as redundant or semantically similar due to the sensors’ spatial proximity. Existing approaches tackle semantic interoperability through ontologies and taxonomies resulting in rigidness, non-scalability, and domain-dependency. At the same time, users can either create representationally-coupled queries that could be complex (e.g. SPARQL), independent of their data knowledge and expertise, or simple queries that lead to redundant information, which can overwhelm them. Existing approaches either use complex queries or create high-level data abstractions that are either not usable or complex for dynamic environments and suffer from representational coupling. This thesis addresses these problems and analyses two research questions involving the formulation of a new Pub/Sub scheme; the Entity-centric Publish/Subscribe Summarisation System that involves user-friendly and contextually-aware subscriptions as well as extractive and abstractive summarisation approaches for the publications. Its goal is to address usability, user expressibility, data expressiveness, user and data effectiveness, and system efficiency. Three approaches are proposed; PubSum, IoTSAX, and PoSSUM. PubSum is a dynamic diverse entity summarisation of heterogeneous Linked Data streams through windowing policies, embedding-based DBSCAN clustering, and geometric-based top-k ranking. IoTSAX is a dynamic abstractive summarisation of heterogeneous numerical entity graph streams through enhanced Symbolic Aggregate approximation (SAX) and approximate rule-based reasoning. PoSSUM is an extractive and abstractive diverse summarisation of heterogeneous numerical and Linked Data streams through novel partly-incremental conceptual clustering based on embedding models and variance as well as contextual-based top-k ranking. As an example, doctors are not experts in query languages and are unaware of the content and representations of patient data in a system. The proposed system will require a simple patient-centric subscription that will create a summary as a notification. This summary will be abstractive by interpreting the shape of real-time health sensor readings and providing a high-level inference as well as extractive by including the most important and conceptually/contextually diverse information coming from external sources (e.g. personal information). The proposed system has been extensively evaluated by synthetic and real-world data from the domains of Healthcare and Smart Cities achieving comparable results in correctness and system performance. Specifically, PubSum, involving DBpedia data, achieves up to 92% reduction of forwarded messages, 69.3% duplication reduction, and 0.95 redundancy-aware F-score compared to traditional Pub/Sub, but at the expense of 4 times more latency, while achieving 6 times less latency and 3 times less memory compared to the state-of-the-art diverse entity summarisation with throughput ranging from 833 to 1,005 events/second. IoTSAX, involving real-world heterogeneous data related to Healthcare and Smart Cities, achieves up to 0.87 reasoning F-score, 98% reduction of forwarded messages, and outperforms the original SAX in approximation error (2 to 3 times less) and compression space-saving percentage when data redundancy occurs (from 71.75% to 94.99%) while maintaining similar or better latency and throughput. The latency is 2 to 3 times more compared to traditional Pub/Sub and the throughput ranges from 13.231 to 97.393 events/second. PoSSUM, involving real-world heterogeneous data, discovers up to 80% data diversity desire by users and achieves the best summary quality for more than half of the entities as well as the best conceptual clustering F-score from 0.69 to 0.83 compared to traditional Pub/Sub and the state-of-the-art diverse entity summarisation. Also, up to 0.95 redundancy-aware F-score and 99% message reduction compared to traditional Pub/Sub. Finally, it has less clustering processing time, scoring and memory consumption, and comparable latency and throughput.en_IE
dc.publisherNUI Galway
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 Ireland
dc.rightsCC BY-NC-ND 3.0 IE
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/3.0/ie/
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/3.0/ie/
dc.subjectPublish/Subscribe Systemsen_IE
dc.subjectEvent-based Systemsen_IE
dc.subjectInternet of Thingsen_IE
dc.subjectEntity Summarisationen_IE
dc.subjectAbstractive Summariesen_IE
dc.subjectExtractive Summariesen_IE
dc.subjectData Approximationen_IE
dc.subjectApproximate Reasoningen_IE
dc.subjectData Fusionen_IE
dc.subjectData Diversityen_IE
dc.subjectGraph Streamsen_IE
dc.subjectRDF Graphsen_IE
dc.subjectWord Embeddingsen_IE
dc.subjectOntologiesen_IE
dc.subjectThesaurusen_IE
dc.subjectTaxonomiesen_IE
dc.subjectScience and Engineeringen_IE
dc.subjectComputer Scienceen_IE
dc.subjectData Scienceen_IE
dc.titleEntity summarisation for entity-centric publish/subscribe systemsen_IE
dc.typeThesisen
dc.contributor.funderBig Data Value ecosystem (BDVe)en_IE
dc.contributor.funderScience Foundation Irelanden_IE
dc.contributor.funderHorizon 2020en_IE
dc.contributor.funderEuropean Regional Development Funden_IE
dc.local.noteThis thesis contributes an entity-centric Publish/Subscribe framework for the Internet of Things that supports user-friendly diversity-aware summarisations of heterogeneous data streams in an effective and efficient manner.en_IE
dc.local.finalYesen_IE
dcterms.projectinfo:eu-repo/grantAgreement/EC/H2020::CSA/732630/EU/Big Data Value ecosystem/BDVeen_IE
nui.item.downloads43


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 Ireland
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Ireland