A Hybrid Framework for Querying Linked Data Dynamically

View/ Open
Date
2012-09-28Author
Umbrich, Jürgen
Metadata
Show full item recordUsage
This item's downloads: 6968 (view details)
Abstract
As of today, the Web has evolved to become the largest collection of information
made available by mankind. Researchers and developers are continuously working
on transforming this loosely connected data collection into a giant knowledge
base. As part of this trend, the Semantic Web community has started a movement
to transform the Web of unstructured text into the so called 'Web of Data'-a
framework to create, share and reuse data by humans and machines alike across
application, enterprise, and community boundaries. From this movement, Linked
Data has emerged as a set of best practices to publish, connect and discover structured
data on the Web using standard formats. As of today, there are over thirty
billion public facts which can be accessed, reused and combined by individuals as
well as organisations and companies.
As the Web of Data continues to expand and diversify, it becomes more and
more dynamic with data being constantly generated, removed and updated, e.g.,
from sensor/stream sources. New querying techniques are required to eXciently
keep up with this trend. While traditional approaches facilitate fast query times
by replicating Web data in optimised oYine index structures , they cannot deal
eXciently with dynamic data and cannot guarantee up-to-date results. A new generation
of distributed Linked Data query engines address this problem and deliver
up-to-date results by retrieving query relevant data immediately before or during
query execution. However fetching data at runtime from potentially hundreds or
thousands of relevant Web sources is slow compared to optimised index lookups.
This thesis studies and improves distributed query approaches for Linked Data
and develops a hybrid query framework that oUers fresh and fast query results by
combining centralised and distributed query techniques with a novel query planning
approach based on knowledge about the dynamicity of data.
We start by identifying the diUerent levels of dynamicity within Linked Data
and highlight the challenges for centralised query approaches to deliver up-to-date
results if operating over such dynamic data.We then present a study of link traversal
based query execution approaches for Linked Data and show how the query
performance can be improved by providing reasoning extensions.We have also developed
an approximate index structure that summarises the graph-structured content
of Web sources, and provide an algorithm that exploits this source summary
index. Finally, we propose and evaluate a novel hybrid query engine framework
that combines the execution strength of materialised query approaches with the
live results from distributed query approaches. The query planning phase uses a
cost-model that combines standard selectivity and novel dynamicity estimates to
enable fast and fresh results.