Cache maintenance in federated query processing based on quality of service constraints
MetadataShow full item record
This item's downloads: 540 (view details)
The Linked Open Data (LOD) Cloud forms a substantial and ever increasing portion of the global knowledge on theWorldWideWeb by contributing many distributed data sources. Due to the diversity of the knowledge that they expose, processing queries over these distributed autonomous sources can lead to interesting interdisciplinary discoveries and thus it is an important problem to address. As the cloud computing is gaining momentum, everything is expected to be provided as a service including query processing and knowledge extraction functionalities. This means a scalable, available and efficient implementation for query processing and knowledge extraction frameworks is required. However, Linked Data sources are distributed and autonomous and thus federating queries over them results in availability and performance problems. Caching and replication are the first class solutions to address availability and performance problems but lead to some trade-offs among quality metrics of provided response (i.e., quality of service). Quality metrics of response in a query processor with cache are the response latency and response consistency. In this thesis, I use caching to mitigate the availability and performance issues of processing queries over LOD. Therefore, this thesis addresses the problem of how to efficiently maintain a cache of distributed Linked Data by considering user-defined constraints on response consistency and latency. I decompose this problem into following sub-problems and address them throughout my thesis. First, to manage consistency/latency trade-off, metrics are required to quantify response consistency and latency. I discuss and motivate relevant metrics that I used to quantify them. Second, consistency/latency trade-off with consistency constraint is analyzed in the domain of Linked Data market systems. I propose a solution to estimate the response consistency on an existing state of the local cache. Moreover, I discuss how this solution can be leveraged to keep the maintenance at the bare minimum level and trigger it only if it is required to satisfy consistency constraint. Third, consistency/latency trade-off with latency constraints is analyzed in the domain of stream processing due to the critical importance of latency in this domain. My proposed maintenance framework refreshes the most influential cached entries on response freshness. Therefore, it maximizes the response freshness while respecting latency constraints. Fourth, I relax the assumption of having a big cache that can accommodate all required data. This adds completeness as an additional dimension for response consistency into the trade-off. The extended maintenance framework maximizes freshness (and completeness) when the latency and caching space is constrained. I show that increasing latency leads to higher freshness (and completeness) with a fixed cache size. Additionally, I show that the proposed maintenance framework outperforms the baselines in all evaluation metrics. The proposed policies in this thesis are contributed to an existing open source stream processor. Experimental results show that the performance of the extended system is boosted significantly.