dc.description.abstract | The diversity of the immunological repertoire has long been a subject of
research focus, providing important insights into the adaptive immune system.
Rapid developments in next generation sequencing technologies have
revolutionized the way immunological repertoires are analyzed, providing
unprecedented high-resolution data. Nonetheless, these high-throughput
approaches also present unique computational challenges that must be
addressed through the development of accurate and efficient bioinformatics
pipelines. In this thesis, we demonstrated a complete bioinformatics workflow
for processing and analysis of high-throughput sequences from immune
receptors, and applied these tools to explore research questions relating to the
diversity of immune receptor genes in human populations.
An aspect of the immunological repertoire that is frequently of immediate
interest to immunologists is the distribution of different immune receptor
clonotypes among individuals, as knowledge of this could lead to a better
understanding of the dynamics of the immune system in different conditions.
We first implemented a bioinformatics pipeline to analyze next generation
sequencing data from T cell receptors and immunoglobulins. This pipeline
featured an ultra-fast and accurate fast-tag-searching algorithm for VDJ
alignments, which outperformed all the other similar pipelines on
benchmarking. In addition to that, this pipeline included two novel functional
components. The first function was polymorphism analysis, which reports
putative novel SNPs found in the input sequences. The second novel function
was the ability to construct lineage mutation trees to describe the affinity
maturation process of immunoglobulins.
No matter how sophisticated the alignment algorithms are, accurate gene
alignment always requires the right reference database. Unfortunately, the
IMGT database, which is the most widely used reference database in
immunological repertoire analysis pipelines, has been shown to be incomplete
and to contain numerous errors. Thus, the second task undertaken in this PhD
thesis was to create a more comprehensive reference database for T cell
receptors and immunoglobulin genes by exploiting the large volume of
publicly available human genome resequencing data generated in recent years.
Based on the variant calling information retrieved from the 1000 Genomes
Project and the current human reference genomes, we were able to infer a set
of putative alleles of immune receptor genes. Lym1k, our database of these
inferred alleles, provided a more comprehensive collection of immune
receptor alleles found in global human populations, as evidenced by a
significantly improved alignment performance on real datasets compared to
IMGT.
The immune receptor loci are among the most dynamic regions of the human
genome, with a high rate of structural variation, as well as high allelic diversity.
Previous analyses of the allelic diversity of immune receptor genes in global
human populations were constrained by the limited size of human genome
resequencing data. We focused on addressing three research questions
relating to the allelic diversity of immune receptor genes in our last research
chapter. Firstly, it has been shown by many studies that African populations
have greater overall allelic richness than other human populations, we thus
compared the allelic diversity between African and Non-African populations
for immune receptor genes. Not surprisingly, the immune receptor alleles in
African populations were more diversified compared to Non-African
populations. As the immune receptor genes with the same gene type are
located adjacent to each other on the chromosome, we secondly investigated if
genomic location was associated with allelic diversity, potentially reflecting
differences in the frequency of receptor gene use between genes located
towards the proximal or distal ends of the arrays of genes of a given type.
However, we did not find an effect of position on allelic diversity. Lastly, we
hypothesized that immune receptor genes that are more frequently selected
during rearrangement are under higher diversifying selection pressure, and
this would lead to a higher allelic diversity. Surprisingly, the correlation was
absent from most of the gene types except for weak positive correlations in
TCRA genes.
In conclusion, this thesis demonstrated several novel high-throughput
approaches and strategies for immunological repertoire analysis. It also
addressed some important biological questions relating to the allelic diversity
of immune receptor genes by exploiting public biological resources, which
could potentially inform subsequent studies. | en_IE |