Machine learning and high-performance computing: Infrastructure and algorithms for the genome-scale study of genetic and epigenetic regulatory mechanisms with applications in neuroscience
|dc.contributor.author||Ó Broin, Pilib|
|dc.description.abstract||The advent of next-generation sequencing (NGS) has fundamentally changed modern genomics re-search. These sequencers generate terabytes of data and necessitate the use, not only of high-performance compute (HPC) clusters for data processing and storage, but also of intelligent, scalable algorithms for pattern discovery and data mining. This thesis details the development of infrastructure and algorithms which automate much of this data analysis process allowing bench biologists to remain focused on the scientific questions that drive them, rather than the informatics challenges associated with these new platforms. We describe WASP, one of the first end-to-end systems to handle all aspects of NGS data generation, including sample submission, laboratory information management system (LIMS) functionality, and assay-specific processing pipelines. Furthermore, we present two machine learning algorithms for the secondary analysis of ChIP-seq data, the first, based on the use of self-organising maps (SOMs) for improved de novo motif discovery, and the second, which uses genetic algorithms (GAs) to automatically cluster transcription factor binding motifs. Finally, we present an application of this infrastructure and these techniques to the study of the role of the TBX1 transcription factor in 22q11.2 Deletion Syndrome, examining its putative role in neural development, adult neurogenesis, autism spectrum disorder (ASD), and schizophrenia.||en_US|
|dc.title||Machine learning and high-performance computing: Infrastructure and algorithms for the genome-scale study of genetic and epigenetic regulatory mechanisms with applications in neuroscience||en_US|
|dc.local.note||The decoding of the information contained within an individual's genome requires vast amounts of computing power as well as sophisticated algorithms. Here, we describe the development of such infrastructure and algorithms and demonstrate their application to the study of schizophrenia and autism spectrum disorder in a mouse model.||en_US|
Files in this item
This item is available under the Attribution-NonCommercial-NoDerivs 3.0 Ireland. No item may be reproduced for commercial purposes. Please refer to the publisher's URL where this is made available, or to notes contained in the item itself. Other terms may apply.
The following license files are associated with this item: