Expressive RDF stream reasoning via data parallelism in answer set programming
Pham, Le Thi Anh Thu
MetadataShow full item record
This item's downloads: 1520 (view details)
The Web nowadays is highly dynamic with massive amounts of data being continuously generated from a huge number of devices and services across the Internet. Various application scenarios in several domains, such as environment monitoring, health care systems, and smart transportation, can hugely benefit from the ability to efficiently integrate and query data streams from these sources to provide better services. However, in such applications, it is not only capturing data streams that is important, but also the ability to extract insights from such streams, and use them to target users' needs, preferences and constraints. For this reason, different types of complex reasoning tasks need to be efficiently designed and executed on such streams to capture the sophisticated requirements of users. Stream Reasoning is an emerging research area which focuses on providing continuous complex reasoning capabilities over data streams. However, Stream Reasoning faces many challenges not only due to their heterogeneity but also due to the exponential growth in the availability of streaming data on the Web, which severely limits the complexity of reasoning that can be used to extract actionable knowledge in a scalable and reliable way. The key challenge addressed in this thesis is to enable expressive reasoning over massive, distributed, heterogeneous data streams in a scalable way. I address this problem by integrating Semantic Web for semantic integration, Answer Set Programming (ASP) for expressive reasoning, and Data Stream Management Systems for stream processing. The trade-off between scalability and expressivity in Stream Reasoning is considered, and parallel reasoning techniques are proposed to enhancing scalability while maintaining some of the key reasoning capabilities that are more expressive but also computationally more expensive. The thesis addresses two research questions related to how the expressivity and scalability of a reasoner can be improved when reasoning on Semantic Web data streams. For the first research question which targets expressivity, I propose C-ASP, a language extended from the ASP language with RDF streaming operators, which allows users to express complex requirements in terms of preferences and constraints, as a continuous reasoning request. The C-ASP reasoner is implemented to continuously evaluate such reasoning request when new data arrives. The experimental evaluation shows that the C-ASP engine outperforms the state-of-the-art RDF stream processing engine C-SPARQL. For the second research question which focuses on the scalability, I optimize the reasoning process of the C-ASP reasoner with a parallel approach based on data-level parallelism, and I demonstrate how the correctness of the results can be maintained. To do so, a clear characterization and formal definitions for analyzing the dependencies among input data streams are provided. The algorithms are developed to create a partitioning plan for guiding the parallel reasoning process to split data streams on-the-fly. Experiments show that applying this data-level parallelism improves the reasoning process significantly. The research discussed in this thesis has been deployed in two real-world scenarios in the context of Smart Cities where event-driven contextual knowledge extraction is introduced, and Smart Enterprise where an Internet of Things-enabled meeting management system is developed. The former aims at continuously identifying and filtering critical events that might affect the decision making of users while the latter investigates how to enhance users' experience in online meetings on-the-go by using mobile sensors embedded in a communication platform. By addressing the requirements of such scenarios, the prototypes demonstrate the validity and feasibility of the approach proposed in this thesis.