Quantitative analysis of the coding capacity of C. elegans using RNA-Seq data

Resource type
Thesis type
(Thesis) M.Sc.
Date created
Annotating the genome of the nematode Caenorhabditis elegans has been an ongoing challenge for the last twenty years. Studies have leveraged high-throughput RNA-sequencing (RNA-Seq) to uncover evidence for thousands of novel splicing events, indicating that the current annotations are far from complete. Yet, there is some uncertainty whether the many rare events represent functional transcripts, or simply biological noise. We developed a method that leverages the wealth of publicly available RNA-Seq data to perform a quantitative evaluation of the completeness of the current C. elegans genome annotation. We identified 134,949 and 204,812 novel high-quality introns and exons, respectively. We find that many introns and exons are rarely expressed overall, but strongly expressed at specific developmental stages suggesting a functional role. We assembled a high-quality set of 72,274 protein-coding transcripts to show that only a fraction of the coding transcriptome of C. elegans is represented in the current genome annotation.
Copyright statement
Copyright is held by the author.
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Scholarly level
Supervisor or Senior Supervisor
Thesis advisor: Chen, Jack
Attachment Size
etd20050.pdf 4.53 MB