Comparison of human genomes shows that along with single nucleotide polymorphisms and small indels, larger structural variants (SVs) are common. Recent studies even suggest that more base pairs are altered as a result of structural variations (including copy number variations) than as a result of single nucleotide variations or small indels. It is also been known that structural variations can cause a loss or gain of functionality and can have phenotypic effects. Recently, with the advent of high-throughput sequencing technologies, the field of genomics has been revolutionized. The realization of high-throughput sequencing platforms now makes it feasible to detect the full spectrum of genomic variation (including SVs) among many individual genomes, including cancer patients and others suffering from diseases of genomic origin. In addition, high-throughput sequencing technologies make it possible to extend the scope of structural variation studies to a point previously unimaginable as exemplified with the 1000 Genomes Project. In this dissertation we consider the structural variation discovery problem using high-throughput sequencing technologies. We provide combinatorial formulations for this problem under a maximum parsimony assumption, and design approximation algorithms for them. We also extend our proposed algorithms to consider conflicts between potential structural variations and resolve them. It should be noted that our algorithms are able to detect most of the well-known structural variation types including small insertions, deletions, inversions, and transpositions. Finally we extend our algorithms to allow simultaneous discovery of structural variations in multiple genomes and thus improve the final comparative results between different donors.
Copyright is held by the author.
The author granted permission for the file to be printed and for the text to be copied and pasted.
Supervisor or Senior Supervisor
Thesis advisor: Sahinalp, Cenk
Member of collection