Evaluating and benchmarking the performance of federated SPARQL endpoints and their partitioning using selected metrics and specific query types

Rakhmawati, Nur Aini

dc.contributor.advisor	Decker, Stefan
dc.contributor.author	Rakhmawati, Nur Aini
dc.date.accessioned	2017-02-08T08:44:49Z
dc.date.available	2017-02-08T08:44:49Z
dc.date.issued	2017-02-07
dc.identifier.uri	http://hdl.handle.net/10379/6284
dc.description.abstract	The increasing amount of Linked Data and its inherent distributed nature have created for need to developing and researching querying technologies. Inspired by research results from traditional distributed databases, different approaches for managing federation over SPARQL Endpoints have been introduced. Such a system consists of a federated engine as the query mediator and a group of SPARQL endpoints as the data provider. SPARQL is the standardised query language for RDF, the default data model used in Linked Data deployments, and SPARQL endpoints are a popular access mechanism provided by many RDF repositories. The growth of the number of federated SPARQL query systems creates the necessity for benchmarking systems to evaluate their performance. Designing a benchmark for a federated SPARQL query system is a non-trivial task since it consists of heterogeneous systems (e.g. hardware, software, data structure and data distribution) which are also distributed. In this thesis, we design a comprehensive benchmark based on the dependencies between the metrics, datasets and queries. We initially investigate existing federated engines and compare their features and behaviours. Based on this investigation, we first identify the metrics that are suitable to assess the performance of federated SPARQL query systems. We introduce three types of metrics: independent metrics, semi-independent metrics and composite metrics. Thereafter, we investigate the benefits and the costs associated while federating a SPARQL query over multiple sources having links between them in the existing federated engines. Next, we present six approaches to generate a dataset for benchmarking a federated SPARQL queries. Thereafter, by using those approaches, we generate 9 datasets and then observe the relationship between the spreading factor of those datasets and the communication cost. The spreading factor is a dataset metric for computing the distribution of classes and properties throughout a set of data sources. Finally, we present QFed, a dynamic SPARQL query set generator for federated SPARQL query benchmarks that takes into account the characteristics of both datasets and queries along with the metrics.	en_IE
dc.rights	Attribution-NonCommercial-NoDerivs 3.0 Ireland
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/3.0/ie/
dc.subject	SPARQL	en_IE
dc.subject	Semantic web	en_IE
dc.subject	Linked data	en_IE
dc.subject	Federated SPARQL query	en_IE
dc.subject	Benchmark	en_IE
dc.subject	Data analytics	en_IE
dc.title	Evaluating and benchmarking the performance of federated SPARQL endpoints and their partitioning using selected metrics and specific query types	en_IE
dc.type	Thesis	en_IE
dc.contributor.funder	Indonesian Directorate General of Higher Education Scholarship	en_IE
dc.contributor.funder	IRCSET Postgraduate Scholarship by Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289	en_IE
dc.local.final	Yes	en_IE
nui.item.downloads	1213

Files in this item

Name:: license.txt
Size:: 5.659Kb
Format:: Text file

View/Open

Name:: 2017RakhmawatiPhD.pdf
Size:: 1.313Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

University of Galway Theses (PhD Theses)

Show simple item record

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Ireland