Fuzzy k-mers and their application to comparative genome assembly
MetadataShow full item record
This item's downloads: 1786 (view details)
The application of k-mer matching to problems in the field of bioinformatics is long established, with k-mer techniques underpinning standard heuristic approaches to sequence alignment and genome assembly. Despite their broad application, conventional k-mer matching techniques lack a native mechanism for accommodating sequence variability, requiring an exact match at pre-defined indices in a k-mer seed. This thesis presents a fuzzy approach for approximate k-mer matching and investigates its application to sequence alignment and comparative assembly. By combining the speed of hashing with the sensitivity of dynamic programming, fuzzy k-mers unify the two phases of the 'seed and extend' strategy into a single operation that executes in average constant time. In contrast with existing methods of k-mer matching, fuzzy k-mers provide native support for string variability. The fuzzy approach has been implemented in a prototype sequence aligner and genome assembler called Ferox. In addition to their exploitation for sequence alignment, the prototype directly integrates fuzzy k-mer alignments into the contig construction process by combining models of de novo and comparative genome assembly.