Fuzzy k-mers and their application to comparative genome assembly
View/ Open
Date
2013-10-22Author
Healy, John
Metadata
Show full item recordUsage
This item's downloads: 1796 (view details)
Abstract
The application of k-mer matching to problems in the field of bioinformatics is long
established, with k-mer techniques underpinning standard heuristic approaches to
sequence alignment and genome assembly. Despite their broad application,
conventional k-mer matching techniques lack a native mechanism for accommodating
sequence variability, requiring an exact match at pre-defined indices in a k-mer seed.
This thesis presents a fuzzy approach for approximate k-mer matching and
investigates its application to sequence alignment and comparative assembly. By
combining the speed of hashing with the sensitivity of dynamic programming, fuzzy
k-mers unify the two phases of the 'seed and extend' strategy into a single operation
that executes in average constant time. In contrast with existing methods of k-mer
matching, fuzzy k-mers provide native support for string variability.
The fuzzy approach has been implemented in a prototype sequence aligner and
genome assembler called Ferox. In addition to their exploitation for sequence
alignment, the prototype directly integrates fuzzy k-mer alignments into the contig
construction process by combining models of de novo and comparative genome
assembly.