Analysis of putative somatic mutations in 200,000 human exomes
View/ Open
Date
2023-09-19Author
Bennett, Declan
Metadata
Show full item recordUsage
This item's downloads: 42 (view details)
Abstract
Somatic mutations accumulate throughout life and contribute significantly to disease
risk. While research into somatic mutation is well established in cancer, it is only in
recent years that investigations into the implications of somatic mutations in healthy
tissues have begun to be feasible, due to advances in sequencing technologies and protocols.
The requirement of specialist techniques has, however, limited the study of
somatic mutations in healthy tissues to small sample sizes, which do not allow for assessment
of the impact of somatic mutations on human health on a population scale. We
posited that it may be possible to study variation in the somatic mutation rate between
individuals and across the genome through analysis of low-depth sequencing data, by
developing strategies to distinguish the contribution of somatic mutations to the mismatches
(relative to the reference genome) observed in these data from sequencing
errors, DNA damage and other artefacts.
Using somatic mutation rates obtained from the literature, we estimated that 0.4%
of the mismatches between the UK Biobank exome sequencing reads and the reference
genome were due to somatic mutations. We demonstrated that this proportion
was sufficient to induce a relationship between the abundance of mismatches and age,
when individuals were grouped by integer age. We then searched for additional sample
properties that are correlated with the mismatch burden and found positive correlations
with cancer diagnosis and smoking status. However, by carefully examining the UK
Biobank exome sequencing data, we uncovered previously unreported batch effects relating
to sequencing run. The observed associations with cancer diagnosis and smoking
status were lost when we corrected for this batch effect. However, the batch correction
improved the correlation between age and mismatch load.
Individuals diagnosed with Lynch syndrome have increased somatic mutation loads
due to deficiencies in mismatch repair genes and we investigated whether this effect
could be detected in the exome sequencing data. In the UK Biobank, we identified
160 individuals with pathogenic variants associated with Lynch syndrome. Using the
COSMIC signatures associated with mismatch repair, we compared the contribution
of mismatch repair mutational signatures between the Lynch syndrome samples and
the remaining samples. We detected a marginally statistically significant difference
between the contribution of SBS18 between the two sample groups; however, this result
did not survive multiple correction testing.