Show simple item record

dc.contributor.authorRobbins, D. E.
dc.contributor.authorGruneberg, A.
dc.contributor.authorDeus, H. F.
dc.contributor.authorTanik, M. M.
dc.contributor.authorAlmeida, J. S.
dc.date.accessioned2018-09-20T16:22:56Z
dc.date.available2018-09-20T16:22:56Z
dc.date.issued2013-04-17
dc.identifier.citationRobbins, D. E. Gruneberg, A.; Deus, H. F.; Tanik, M. M.; Almeida, J. S. (2013). A self-updating road map of the cancer genome atlas. Bioinformatics 29 (10), 1333-1340
dc.identifier.issn1367-4803,1460-2059
dc.identifier.urihttp://hdl.handle.net/10379/13663
dc.description.abstractMotivation: Since 2011, The Cancer Genome Atlas' (TCGA) files have been accessible through HTTP from a public site, creating entirely new possibilities for cancer informatics by enhancing data discovery and retrieval. Significantly, these enhancements enable the reporting of analysis results that can be fully traced to and reproduced using their source data. However, to realize this possibility, a continually updated road map of files in the TCGA is required. Creation of such a road map represents a significant data modeling challenge, due to the size and fluidity of this resource: each of the 33 cancer types is instantiated in only partially overlapping sets of analytical platforms, while the number of data files available doubles approximately every 7 months. Results: We developed an engine to index and annotate the TCGA files, relying exclusively on third-generation web technologies (Web 3.0). Specifically, this engine uses JavaScript in conjunction with the World Wide Web Consortium's (W3C) Resource Description Framework (RDF), and SPARQL, the query language for RDF, to capture metadata of files in the TCGA open-access HTTP directory. The resulting index may be queried using SPARQL, and enables file-level provenance annotations as well as discovery of arbitrary subsets of files, based on their metadata, using web standard languages. In turn, these abilities enhance the reproducibility and distribution of novel results delivered as elements of a web-based computational ecosystem. The development of the TCGA Roadmap engine was found to provide specific clues about how biomedical big data initiatives should be exposed as public resources for exploratory analysis, data mining and reproducible research. These specific design elements align with the concept of knowledge reengineering and represent a sharp departure from top-down approaches in grid initiatives such as CaBIG. They also present a much more interoperable and reproducible alternative to the still pervasive use of data portals.
dc.publisherOxford University Press (OUP)
dc.relation.ispartofBioinformatics
dc.subjectglioblastoma
dc.subjectmapreduce
dc.subjectcore
dc.titleA self-updating road map of the cancer genome atlas
dc.typeArticle
dc.identifier.doi10.1093/bioinformatics/btt141
dc.local.publishedsourcehttps://academic.oup.com/bioinformatics/article-pdf/29/10/1333/720034/btt141.pdf
nui.item.downloads0


Files in this item

This item appears in the following Collection(s)

Show simple item record