Estimation and analysis of gene expression and alternative splicing: perspectives from development and disease
View/ Open
Date
2014-01-30Author
Korir, Paul Kibet
Metadata
Show full item recordUsage
This item's downloads: 1160 (view details)
Abstract
The development of high-throughput genomics technologies has contributed substantially to the understanding of gene expression regulation. With the growing appreciation of the importance of alternative splicing, quantitative techniques have had to keep in step with the demand for an accurate and high resolution view of the transcriptome. In this thesis, we use results and methods from quantitative genomics to explore how gene regulation may be modified in development and disease.
Precise regulation of gene expression timing can be critical for some biological processes. This is particularly the case for genes with oscillating patterns of expression. Oscillations can be brought about through negative feedback loops, with a delay between gene activation and negative autoregulation. The time required for gene transcription contributes to the delay in gene activation and, thus, the intron content of genes involved in negative autoregulatory loops can be functionally significant. An example of this occurs in Hes7, in which oscillation is coupled to the formation of segmental body plans during animal development. To identify further examples of genes in which the transcriptional delay introduced by introns may be functionally significant, we carried out a search for genes with conserved intron content across a diverse panel of 19 mammals and found that the set of genes with the most extreme conservation was enriched for genes involved in embryonic development. We found that these genes had both fewer insertions and deletions as well as a balance between the cumulative insertions and deletions, suggesting that selection functions to prevent indels in these introns and to balance the impact of insertions and deletions.
There has been considerable success in mapping local (cis) variants associated with phenotypes such as disease. Many cis-acting variants that cause disease disrupt splicing. However, mapping distant (trans) acting variants that affect splicing is a more formidable task. By exploiting high-density transcriptome microarrays, we show that a mutation in the splicing factor PRPF8, causally associated with retinitis pigmentosa, is a trans-acting splicing variant. We estimate that up to 20% of all exons are mis-spliced through higher exon inclusion in affected individuals. Characteristics of affected exons suggest that they tend to be spliced co-transcriptionally and via the exon-defined splicing pathway.
The importance of gene expression microarrays in quantitative genomics has led to the development of numerous algorithms to estimate gene expression from raw microarray intensities. But microarrays have several shortcomings relative to more recently developed sequencing-based methods for measuring gene expression. We exploited the benefits of quantitative transcriptome sequencing (RNA-Seq) by using a statistical learning approach to obtain better expression estimates from arrays, based on a high-quality dataset for which both microarray and RNA-Seq data are available. Our analyses show that this approach compares favourably to existing algorithms for microarray analysis, with the added advantage of providing estimates of the abundance of individual transcript isoforms on an absolute scale.