The ubiquitous abundance of circular RNAs (circRNAs) has been revealed by performing high throughput sequencing in a variety of eukaryotes. circRNAs are related to some diseases such as cancer in which they act as oncogenes or tumor-suppressors, and therefore have the potential to be used as biomarkers or therapeutic targets. Accurate and rapid detection of circRNAs from short reads remains computationally challenging. This is due to the fact that identifying chimeric reads, which is essential for finding back-splice junctions, is a complex process. The sensitivity of discovery methods, to a high degree, relies on the underlying mapper that is used for finding chimeric reads. Furthermore, all the available circRNA discovery pipelines are resource intensive. We introduce CircMiner, a novel stand-alone circRNA detection method that rapidly identifies and filters out linear RNA-Seq reads and detects back-splice junctions. CircMiner employs a rapid pseudo-alignment technique to identify linear reads that originate from transcripts, genes, or the genome. CircMiner further processes the remaining reads to identify the back-splice junctions and detect circRNAs with single-nucleotide resolution. When requested, with extra time overhead, CircMiner also reports the mapping locations of the linear reads. We evaluated the efficacy of CircMiner using simulated datasets generated from known back-splice junctions and showed that CircMiner has superior accuracy and speed compared to the existing circRNA detection tools. Additionally, on two RNase R treated cell line datasets, CircMiner was able to detect more consistent, high confidence circRNAs compared to untreated samples of the same cell line.
Copyright is held by the author.
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Bhattacharya, Binay
Member of collection