dc.contributor.author | Karim, Md. Rezaul | |
dc.contributor.author | Zappa, Achille | |
dc.contributor.author | Sahay, Ratnesh | |
dc.contributor.author | Rebholz-Schuhmann, Dietrich | |
dc.date.accessioned | 2018-08-07T08:58:17Z | |
dc.date.available | 2018-08-07T08:58:17Z | |
dc.date.issued | 2017-05-28 | |
dc.identifier.citation | Karim, Md. Rezaul , Zappa, Achille , Sahay, Ratnesh , & Rebholz-Schuhmann, Dietrich (2017). A Deep Learning Approach to Genomics Data for Population Scale Clustering and Ethnicity Prediction. Paper presented at the Proceedings of the ESWC workshop on Semantic Web solutions for large-scale biomedical data analytics (SeWeBMeDA), Portoroz, Slovenia, May 28, 2017. | en_IE |
dc.identifier.issn | 1613-0073 | |
dc.identifier.uri | http://hdl.handle.net/10379/7459 | |
dc.description.abstract | The understanding of variations in genome sequences assists us in identifying
people who are predisposed to common diseases, solving rare diseases, and finding
corresponding population group of the individuals from a larger population group.
Although classical machine learning techniques allow the researchers to identify groups
or clusters of related variables, accuracies, and effectiveness of these methods diminish
for large and hyperdimensional datasets such as whole human genome. On the other hand,
deep learning (DL) can make better representations of large-scale datasets to build models
to learn these representations very extensively. Furthermore, Semantic Web (SW)
technologies already acted as useful adaptors in life science research for large-scale data
integration and querying. Thus the standardized public data created using SW plays an
increasingly important role in life sciences research. In this paper, we propose a novel and
scalable genomic data analysis towards population scale clustering and predicting
geographic ethnicity using SW and DL-based technique. We used genotypes data from
the 1000 Genome Project resulting from the whole genomes sequencing extracted from
the 2504 individuals consisting of 84 million variants with 26 ethnic origins.
Experimental results in terms accuracy and scalability show the effectiveness and
superiority compared to the state-of-the-art. Particularly, our deep-learning-based
analytics technique using classification and clustering algorithms can predict and group
targeted populations with a prediction accuracy of 98% and an ARI of 0.92 respectively. | en_IE |
dc.description.sponsorship | This publication has emanated from research conducted with the financial support of
Science Foundation Ireland (SFI) under the Grant Number SFI/12/RC/2289. | en_IE |
dc.format | application/pdf | en_IE |
dc.language.iso | en | en_IE |
dc.publisher | CEUR-WS.org | en_IE |
dc.relation.ispartof | SeWeBMeDA, ESWC | en |
dc.rights | Attribution-NonCommercial-NoDerivs 3.0 Ireland | |
dc.rights.uri | https://creativecommons.org/licenses/by-nc-nd/3.0/ie/ | |
dc.subject | Population genomics | en_IE |
dc.subject | 1000 genome project | en_IE |
dc.subject | Genotype Classification | en_IE |
dc.subject | Deep learning | en_IE |
dc.subject | Semantic Web | en_IE |
dc.subject | Genotype clustering | en_IE |
dc.title | A deep learning approach to genomics data for population scale clustering and ethnicity prediction | en_IE |
dc.type | Conference Paper | en_IE |
dc.date.updated | 2018-07-10T17:40:46Z | |
dc.local.publishedsource | http://ceur-ws.org/Vol-1948/paper4.pdf | en_IE |
dc.description.peer-reviewed | non-peer-reviewed | |
dc.contributor.funder | Science Foundation Ireland | en_IE |
dc.internal.rssid | 14544126 | |
dc.local.contact | Ratnesh Nandan Sahay, Deri, Dangan Business Park, Nui Galway. 5253 Email: ratnesh.sahay@nuigalway.ie | |
dc.local.copyrightchecked | Yes | |
dc.local.version | ACCEPTED | |
dcterms.project | info:eu-repo/grantAgreement/SFI/SFI Research Centres/12/RC/2289/IE/INSIGHT - Irelands Big Data and Analytics Research Centre/ | |
nui.item.downloads | 205 | |