Querying web polystores

Khan, Ya; Zimmermann, Antoine; Jha, AlokKumar; Rebholz-Schuhmann, Dietrich; Sahay, Ratnesh

View/Open

khan_polyweb_et.al.pdf (1.104Mb)

Date

2017-12-11

Author

Khan, Ya

Zimmermann, Antoine

Jha, AlokKumar

Rebholz-Schuhmann, Dietrich

Sahay, Ratnesh

Metadata

Show full item record

Usage

This item's downloads: 211 (view details)

Recommended Citation

Khan, Y., Zimmermann, A., Jha, A., Rebholz-Schuhmann, D., & Sahay, R. (2017, 11-14 Dec. 2017). Querying web polystores. Paper presented at the 2017 IEEE International Conference on Big Data (Big Data).

Published Version

http://dx.doi.org/10.1109/BigData.2017.8258299

Abstract

The database, semantic web, and linked data communities have proposed solutions that federate queries over multiple data sources using a single data model. Nowadays, the data retrieval requirements originating from versatile and broad domains like healthcare and life sciences (HCLS) are changing this conventional trend - of federating query over a single data model - primarily due to the simultaneous use of different data models (CSV, JSON, RDB, RDF, XML, etc.) in a real-life scenario. It's now impractical to assume that the variety (graph, key-value, stream, text, table, tree, etc.) of high volume data residing in specialised storage engines will first be converted to a common data model, stored in a general-purpose data storage engine, and finally be queried over the Web. Nevertheless, in this era where genomics datasets are growing from petascale to exascale, it is now important to exploit such vast domain resources in their native data models. The key approach is to query the vast data resources from their native data models and specialised storage engines. In this paper, we propose a Web-based query federation mechanism - called PolyWeb - that unifies query answering over multiple native data models (CSV, RDB, and RDF). We demonstrate PolyWeb on a cancer genomics use-case where it is often the case that a description of biological and chemical entities (e.g., gene, disease, drug, pathways) span across multiple data models. In order to assess the benefits and limitations of evaluating queries over native data models, we evaluate PolyWeb with state-of-the-art query federation engine in terms of result completeness, source selection, and overall query execution time.

URI

http://hdl.handle.net/10379/7458

Collections

Data Science Institute (Conference Papers)

Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Ireland