Analysis of clonal mutations in cancer as a means of studying variation in somatic mutation processes
View/ Open
Date
2023-09-13Author
Cleary, Siobhán
Metadata
Show full item recordUsage
This item's downloads: 215 (view details)
Abstract
Somatic mutations are mutations that arise throughout a person’s lifetime.
They contribute to ageing, cancer and other age-related disorders. Recent
technological advances led to many studies investigating somatic mutations
in normal tissues. However, somatic mutations are hard to identify in normal
tissues due to their low frequency and the difficulty distinguishing between real
mutations and errors incorporated during the experimental processes. Studies
of somatic mutations in normal tissues suggest that there is still much unknown
about how somatic mutations contribute to cancer. Somatic mutations
can be studied by analysing cancer samples. Generally, somatic mutations in
cancer samples are studied to understand cancer progression and response to
treatment. This thesis aimed to investigate somatic mutations present in all
cancer cells of a sample (clonal mutations) as a means to understand what is
happening in normal tissue.
Chapter 2 describes a method to predict the total clonal mutation load
of a cancer sample and the use of this approach to investigate the relationship
between variation in clonal somatic mutation load and di↵erences between
tissues in the risk of developing cancer. Before predicting the total clonal
load, we first needed to distinguish between clonal mutations and mutations
present in only a subset of cells (subclonal). We adjusted variant frequency
for tumour purity and local copy number variation to classify variants as
clonal or subclonal. We used the linear relationship between clonal variants
and age to predict the total clonal burden for each tissue type. Under the
assumption that subclonal mutation accumulation does not correlate with
age, we determined what proportion of true clonal variants were classified
as clonal. By adjusting various thresholds for classifying variants as clonal
variants, we could classify, at best, 45% of the true clonal variants. We then
used the relationship between clonal mutation burden and age to estimate the
true clonal load for our samples. To investigate whether the estimated clonal
mutation burden could be used as a proxy for the number of somatic mutations
in healthy cells, we compared our results to somatic mutation burdens that
have been measured directly in normal tissues (matched for age and tissue
type with the cancer samples). We also found that the predicted clonal load
was correlated with lifetime cancer risk. Our findings suggest that we can use
predicted clonal load from cancer samples to investigate somatic mutations
in the normal tissue and has the advantage of being able to use the large
volume of cancer genomics data that has already been generated to extend
our understanding of the accumulation of somatic mutations in normal tissues.
The major histocompatibility complex (MHC) can present neoantigens
resulting from somatic mutations on the cell surface, potentially directing
an immune response against it. In Chapter 3, we investigated whether gene
expression explains the lack of signal of immunoediting observed among clonal
passenger mutations. This hypothesis stemmed from two publications that
reported that driver mutations arise in gaps in the capacity of the immune
system to recognize them. We investigated whether passenger mutations capable of eliciting an immune response occur preferentially on lowly expressed
genes or if the mutant allele has a lower expression than the reference allele
through a process termed allele-specific expression (ASE). The neoantigen
must be expressed to be presented by the MHC on the cell surface, so a reduction
in expression could be a means by which the immunogenic mutations
are tolerated. After accounting for gene length and sequence context, we
found no di↵erence in the expression of genes harbouring immunogenic mutations
compared to nonimmunogenic or synonymous mutations. Additionally,
there was no evidence that the mutant allele exhibited ASE more often for
immunogenic mutations than nonimmunogenic mutations. Using simulations,
we also estimated an upper bound for the impact of immunoediting on the
mutational landscape in cancer, showing that at most 5% of missense mutations
could be removed by this process. To our knowledge, this was the first
attempt to quantify the proportion of missense mutations removed through
immunoediting.
Finally, in Chapter 4, we extended our analysis on the relationship
between gene expression and somatic mutation accumulation by investigating
the relationship between germline ASE and cancer risk. Here, we investigated
the hypothesis that a single score representing germline ASE in all TSGs for
an individual would be associated with an increased cancer risk because only
mutations on the expressed copy would be required to disrupt the function
of the gene. To assess this, we first tested the ability of two methods to
predict ASE using genotype data. We modified a tool called PrediXcan which
predicts overall gene expression to predict the expression of each haplotype
and generated a ratio with the predicted values. We also applied logistic
regression models using heterozygous SNP status as predictors and ASE status
as the outcome. Although the performance of ASE predictions was poor for
many genes using both methods, our results indicate that it may be possible
to generate more accurate predictions using genotype data as input as more
data becomes available. As a pilot study, we generated a single TSG ASE
score using the genes for which the predictions worked well and assessed the
relationship with breast cancer risk. We found no statistically significant
relationship between TSG ASE and cancer risk, which is likely due to our
inability to predict ASE in the TSGs that contribute to cancer risk in this
tissue type, as assessed using cancer data.
In conclusion, this thesis presented a novel approach to predict the
true clonal load of cancer samples and demonstrated its similarity to the
observed somatic mutation load in normal tissue. We also provided further
insight into the role of the immune system in shaping the mutational landscape
of cancer samples and, using a novel method, generated an estimate for the
proportion of missense mutations removed through immunoediting. Finally,
we also presented a novel approach to predict germline ASE using genotype
data showing it is feasible for some genes and performance is likely to be
improved as more data becomes available.