Show simple item record

dc.contributor.authorKarim, Md. Rezaul
dc.contributor.authorZappa, Achille
dc.contributor.authorSahay, Ratnesh
dc.contributor.authorRebholz-Schuhmann, Dietrich
dc.date.accessioned2018-08-07T08:58:17Z
dc.date.available2018-08-07T08:58:17Z
dc.date.issued2017-05-28
dc.identifier.citationKarim, Md. Rezaul , Zappa, Achille , Sahay, Ratnesh , & Rebholz-Schuhmann, Dietrich (2017). A Deep Learning Approach to Genomics Data for Population Scale Clustering and Ethnicity Prediction. Paper presented at the Proceedings of the ESWC workshop on Semantic Web solutions for large-scale biomedical data analytics (SeWeBMeDA), Portoroz, Slovenia, May 28, 2017.en_IE
dc.identifier.issn1613-0073
dc.identifier.urihttp://hdl.handle.net/10379/7459
dc.description.abstractThe understanding of variations in genome sequences assists us in identifying people who are predisposed to common diseases, solving rare diseases, and finding corresponding population group of the individuals from a larger population group. Although classical machine learning techniques allow the researchers to identify groups or clusters of related variables, accuracies, and effectiveness of these methods diminish for large and hyperdimensional datasets such as whole human genome. On the other hand, deep learning (DL) can make better representations of large-scale datasets to build models to learn these representations very extensively. Furthermore, Semantic Web (SW) technologies already acted as useful adaptors in life science research for large-scale data integration and querying. Thus the standardized public data created using SW plays an increasingly important role in life sciences research. In this paper, we propose a novel and scalable genomic data analysis towards population scale clustering and predicting geographic ethnicity using SW and DL-based technique. We used genotypes data from the 1000 Genome Project resulting from the whole genomes sequencing extracted from the 2504 individuals consisting of 84 million variants with 26 ethnic origins. Experimental results in terms accuracy and scalability show the effectiveness and superiority compared to the state-of-the-art. Particularly, our deep-learning-based analytics technique using classification and clustering algorithms can predict and group targeted populations with a prediction accuracy of 98% and an ARI of 0.92 respectively.en_IE
dc.description.sponsorshipThis publication has emanated from research conducted with the financial support of Science Foundation Ireland (SFI) under the Grant Number SFI/12/RC/2289.en_IE
dc.formatapplication/pdfen_IE
dc.language.isoenen_IE
dc.publisherCEUR-WS.orgen_IE
dc.relation.ispartofSeWeBMeDA, ESWCen
dc.rightsAttribution-NonCommercial-NoDerivs 3.0 Ireland
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/3.0/ie/
dc.subjectPopulation genomicsen_IE
dc.subject1000 genome projecten_IE
dc.subjectGenotype Classificationen_IE
dc.subjectDeep learningen_IE
dc.subjectSemantic Weben_IE
dc.subjectGenotype clusteringen_IE
dc.titleA deep learning approach to genomics data for population scale clustering and ethnicity predictionen_IE
dc.typeConference Paperen_IE
dc.date.updated2018-07-10T17:40:46Z
dc.local.publishedsourcehttp://ceur-ws.org/Vol-1948/paper4.pdfen_IE
dc.description.peer-reviewednon-peer-reviewed
dc.contributor.funderScience Foundation Irelanden_IE
dc.internal.rssid14544126
dc.local.contactRatnesh Nandan Sahay, Deri, Dangan Business Park, Nui Galway. 5253 Email: ratnesh.sahay@nuigalway.ie
dc.local.copyrightcheckedYes
dc.local.versionACCEPTED
dcterms.projectinfo:eu-repo/grantAgreement/SFI/SFI Research Centres/12/RC/2289/IE/INSIGHT - Irelands Big Data and Analytics Research Centre/
nui.item.downloads205


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivs 3.0 Ireland
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivs 3.0 Ireland