Show simple item record

dc.contributor.advisorDecker, Stefan
dc.contributor.authorRakhmawati, Nur Aini
dc.date.accessioned2017-02-08T08:44:49Z
dc.date.available2017-02-08T08:44:49Z
dc.date.issued2017-02-07
dc.identifier.urihttp://hdl.handle.net/10379/6284
dc.description.abstractThe increasing amount of Linked Data and its inherent distributed nature have created for need to developing and researching querying technologies. Inspired by research results from traditional distributed databases, different approaches for managing federation over SPARQL Endpoints have been introduced. Such a system consists of a federated engine as the query mediator and a group of SPARQL endpoints as the data provider. SPARQL is the standardised query language for RDF, the default data model used in Linked Data deployments, and SPARQL endpoints are a popular access mechanism provided by many RDF repositories. The growth of the number of federated SPARQL query systems creates the necessity for benchmarking systems to evaluate their performance. Designing a benchmark for a federated SPARQL query system is a non-trivial task since it consists of heterogeneous systems (e.g. hardware, software, data structure and data distribution) which are also distributed. In this thesis, we design a comprehensive benchmark based on the dependencies between the metrics, datasets and queries. We initially investigate existing federated engines and compare their features and behaviours. Based on this investigation, we first identify the metrics that are suitable to assess the performance of federated SPARQL query systems. We introduce three types of metrics: independent metrics, semi-independent metrics and composite metrics. Thereafter, we investigate the benefits and the costs associated while federating a SPARQL query over multiple sources having links between them in the existing federated engines. Next, we present six approaches to generate a dataset for benchmarking a federated SPARQL queries. Thereafter, by using those approaches, we generate 9 datasets and then observe the relationship between the spreading factor of those datasets and the communication cost. The spreading factor is a dataset metric for computing the distribution of classes and properties throughout a set of data sources. Finally, we present QFed, a dynamic SPARQL query set generator for federated SPARQL query benchmarks that takes into account the characteristics of both datasets and queries along with the metrics.en_IE
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 Ireland
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/3.0/ie/
dc.subjectSPARQLen_IE
dc.subjectSemantic weben_IE
dc.subjectLinked dataen_IE
dc.subjectFederated SPARQL queryen_IE
dc.subjectBenchmarken_IE
dc.subjectData analyticsen_IE
dc.titleEvaluating and benchmarking the performance of federated SPARQL endpoints and their partitioning using selected metrics and specific query typesen_IE
dc.typeThesisen_IE
dc.contributor.funderIndonesian Directorate General of Higher Education Scholarshipen_IE
dc.contributor.funderIRCSET Postgraduate Scholarship by Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289en_IE
dc.local.finalYesen_IE
nui.item.downloads1213


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 Ireland
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Ireland