Skip to main content

mrsFAST-Ultra: A Compact, SNP-aware Mapper for High Performance Sequencing Applications

Date created
2013-06-24
Authors/Contributors
Abstract
The high throughput sequencing (HTS) platforms generate unprecedented amounts of data that introduce challenges for processing and downstream analysis. While tools that report the "best'' mapping location of each read provide a fast way to process HTS data, they are not suitable for many types of downstream analysis such as structural variation detection where it is important to report multiple mapping loci for each read. In fact, multi-mapping is the main bottleneck in genomic structural variation detection, RNA-Seq data analysis, etc. For this purpose we introduce mrsFAST-Ultra, a fast, cache oblivious, SNP-aware aligner that can handle the multi-mapping of HTS reads very efficiently. mrsFAST-Ultra improves mrsFAST, the first cache oblivious read aligner capable of handling multi-mappings, through new and compact index structures that reduce both the memory usage and number of CPU operations per alignment. The size of the index generated by mrsFAST-Ultra is 10 times smaller than that of mrsFAST. As importantly, mrsFAST-Ultra introduces new features such as being able to (1) obtain the best mapping loci for each read, and (2) return mapping locations for all reads that have at most k mapping loci (within an error threshold), for any user specified k. Furthermore mrsFAST-Ultra is SNP-aware, i.e., it can map reads to reference genome while discounting the mismatches that occur at common SNP locations provided by db-SNP; this significantly increases the number of reads that can be mapped to the reference genome. Finally mrsFAST-Ultra utilizes the presence of multiple cores and can be tuned for different memory settings. Our results show that mrsFAST-Ultra is up to 4.5-times faster than its predecessor mrsFAST. In comparison to newly enhanced popular tools such as BWA and Bowtie2, it is more sensitive (it can report 60 times or more mappings per read) and much faster (5 times or more) in the multi-mapping mode.
Document
Identifier
etd7875
Copyright statement
Copyright is held by the author.
Permissions
The author granted permission for the file to be printed and for the text to be copied and pasted.
Scholarly level
Member of collection
Download file Size
etd7875_ISarrafi.pdf 860.55 KB

Views & downloads - as of June 2023

Views: 0
Downloads: 0