Genome rearrangements are important mutational events in many cancers, and their detection and characterization has the potential to improve treatment options for cancer patients. Evidence of genome rearrangement is available in the sequence of affected DNA and RNA molecules of tumour cells. The development of high-throughput sequencing has drastically increased the efficiency with which researchers can sequence DNA and RNA molecules, though the new technologies have resulted in an increased computational burden, requiring solutions to novel algorithmic problems. In this thesis we describe novel algorithms for detection and characterization of genome rearrangements with specific focus on rearrangements that reshape tumour genomes and impact cancer biology. We describe a method for detecting gene fusions from RNA sequence data (RNA-Seq). Given both RNA-Seq and Whole Genome Sequence (WGS) data, we describe an integrated method for detection of expressed rearrangements, and subsequently extend this method to account for complex genomic rearrangements. Finally, we describe a method for detecting rearrangements existing in subpopulations of tumour cells, and determining the impact on the content of the genome in those subpopulations. The described methods each formulate a maximum parsimony or likelihood optimization problem, and propose combinatorial algorithms to solve these problems. A common theme for the described methods is the benefits of integrating multiple and diverse data-types. We demonstrate using simulated and real data that principled methods for joint analysis of multiple data-types frequently out-perform independent analyses of each data-type. We apply our methods to the detection and characterization of rearrangements in tumour samples, and provide novel examples of events relevant to the biology of each tumour.
Copyright is held by the author.
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Member of collection