Skip to main content

Computational Discovery of Splicing Events from High-Throughput Omics Data

Resource type
Thesis type
(Thesis) Ph.D.
Date created
2017-07-18
Authors/Contributors
Author (aut): Lin, Yen Yi
Abstract
The splicing mechanism, the process of forming mature messenger RNA (mRNA) by only concatenating exons and removing introns, is an essential step in gene expression. It allows a single gene to have multiple RNA isoforms which potentially code different proteins. In addition, aberrant transcripts generated from non-canonical splicing events (e.g. gene fusions) are believed to be potential drivers in many tumor types and human diseases. Thus, identification and quantification of expressed RNAs from RNA-Seq data become fundamental steps in many clinical studies. For that reason, number of methods have been developed. Most popular computational methods designed for these high-throughput omics data start by analyzing the datasets based on existing gene annotations. However, these tools (i) do not detect novel RNA isoforms and low abundance transcripts; (ii) do not incorporate multi-mapping reads in their read counting strategies in quantifications; (iii) are sensitive to sequencing artifacts. In this thesis, we will address these computational problems for analyzing splicing events from high-throughput omics data. For identification and quantification of expressed RNAs from RNA-Seq data, we introduce CLIIQ, a unified framework to solve these two problems simultaneously. This framework also supports data from multiple samples to improve accuracy. To better incorporate multi-mapping reads into the framework, we design ORMAN, a combinatorial optimization formulation to resolve their mapping ambiguity by assigning single best location for each read. For aberrant transcript detections, we present a computational strategy ProTIE to integratively analyze proteomics and transcriptomic data from the same individual. This strategy provides proteome-level evidence for aberrant transcripts that can be used to eliminate false positives reported solely based on sequencing data.
Document
Identifier
etd10302
Copyright statement
Copyright is held by the author.
Permissions
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Scholarly level
Supervisor or Senior Supervisor
Thesis advisor (ths): Sahinalp, Cenk
Thesis advisor (ths): Ester, Martin
Member of collection
Download file Size
etd10302_YLin.pdf 1.5 MB

Views & downloads - as of June 2023

Views: 13
Downloads: 0