Quantitative analysis of the coding capacity of C. elegans using RNA-Seq data

Author: 
Date created: 
2018-11-30
Identifier: 
etd20050
Keywords: 
Coding capacity
Alternative splicing, Caenorhabditis elegans
RNA-Seq
Transcriptome
Bioinformatics
Abstract: 

Annotating the genome of the nematode Caenorhabditis elegans has been an ongoing challenge for the last twenty years. Studies have leveraged high-throughput RNA-sequencing (RNA-Seq) to uncover evidence for thousands of novel splicing events, indicating that the current annotations are far from complete. Yet, there is some uncertainty whether the many rare events represent functional transcripts, or simply biological noise. We developed a method that leverages the wealth of publicly available RNA-Seq data to perform a quantitative evaluation of the completeness of the current C. elegans genome annotation. We identified 134,949 and 204,812 novel high-quality introns and exons, respectively. We find that many introns and exons are rarely expressed overall, but strongly expressed at specific developmental stages suggesting a functional role. We assembled a high-quality set of 72,274 protein-coding transcripts to show that only a fraction of the coding transcriptome of C. elegans is represented in the current genome annotation.

Document type: 
Thesis
Rights: 
This thesis may be printed or downloaded for non-commercial research and scholarly purposes. Copyright remains with the author.
File(s): 
Senior supervisor: 
Jack Chen
Department: 
Science: Department of Molecular Biology and Biochemistry
Thesis type: 
(Thesis) M.Sc.
Statistics: