Show simple item record

dc.contributor.authorPadmanabhuni, Shanmukha S.
dc.contributor.authorIqbal, Aftab
dc.contributor.authorDecker, Stefan
dc.date.accessioned2016-01-14T09:51:26Z
dc.date.available2016-01-14T09:51:26Z
dc.date.issued2014-12-03
dc.identifier.citationSaleem, M,Padmanabhuni, SS,Ngomo, ACN,Iqbal, A,Almeida, JS,Decker, S,Deus, HF (2014) 'TopFed: TCGA tailored federated query processing and linking to LOD'. Journal Of Biomedical Semantics, 5 .en_IE
dc.identifier.issn2041-1480
dc.identifier.urihttp://hdl.handle.net/10379/5449
dc.descriptionJournal articleen_IE
dc.description.abstractBackgroud: The Cancer Genome Atlas (TCGA) is a multidisciplinary, multi-institutional effort to catalogue genetic mutations responsible for cancer using genome analysis techniques. One of the aims of this project is to create a comprehensive and open repository of cancer related molecular analysis, to be exploited by bioinformaticians towards advancing cancer knowledge. However, devising bioinformatics applications to analyse such large dataset is still challenging, as it often requires downloading large archives and parsing the relevant text files. Therefore, it is making it difficult to enable virtual data integration in order to collect the critical co-variates necessary for analysis.Methods: We address these issues by transforming the TCGA data into the Semantic Web standard Resource Description Format (RDF), link it to relevant datasets in the Linked Open Data (LOD) cloud and further propose an efficient data distribution strategy to host the resulting 20.4 billion triples data via several SPARQL endpoints. Having the TCGA data distributed across multiple SPARQL endpoints, we enable biomedical scientists to query and retrieve information from these SPARQL endpoints by proposing a TCGA tailored federated SPARQL query processing engine named TopFed.Results: We compare TopFed with a well established federation engine FedX in terms of source selection and query execution time by using 10 different federated SPARQL queries with varying requirements. Our evaluation results show that TopFed selects on average less than half of the sources (with 100% recall) with query execution time equal to one third to that of FedX.Conclusion: With TopFed, we aim to offer biomedical scientists a single-point-of-access through which distributed TCGA data can be accessed in unison. We believe the proposed system can greatly help researchers in the biomedical domain to carry out their research effectively with TCGA as the amount and diversity of data exceeds the ability of local resources to handle its retrieval and parsing.en_IE
dc.description.sponsorshipGerman Research Foundation (DFG); Universität Leipzigen_IE
dc.formatapplication/pdfen_IE
dc.language.isoenen_IE
dc.publisherBioMed Centralen_IE
dc.relation.ispartofJournal Of Biomedical Semanticsen
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 Ireland
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/3.0/ie/
dc.subjectFederated queriesen_IE
dc.subjectSPARQLen_IE
dc.subjectTCGAen_IE
dc.subjectRDFen_IE
dc.subjectCanceren_IE
dc.titleTopFed: TCGA tailored federated query processing and linking to LODen_IE
dc.typeArticleen_IE
dc.date.updated2016-01-10T22:51:45Z
dc.identifier.doi10.1186/2041-1480-5-47
dc.local.publishedsourcehttp://dx.doi.org/10.1186/2041-1480-5-47en_IE
dc.description.peer-reviewedpeer-reviewed
dc.contributor.funder|~|
dc.internal.rssid9392446
dc.local.contactChaudhry Muhammad Aftab Iqbal, Deri, Ida Business Park, Nui Galway. Email: chaudhrymuhammadaftab.iqbal@nuigalway.ie
dc.local.copyrightcheckedNo
dc.local.versionPUBLISHED
nui.item.downloads1425


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 Ireland
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Ireland