Alignment-free clustering and error correction of UMI tagged DNA molecules

Resource type
Thesis type
(Thesis) M.Sc.
Date created
Author: Orabi, Baraa
The use of circulating tumour DNA (ctDNA) in cancer oncogenomics has the potential for rapid and non-invasive monitoring of patient-specific tumour progression. However, detection of low allele frequency variations in ctDNA raises many challenges, including the handling of sequencing errors. Tagging of DNA molecules with Unique Molecular Identifiers (UMI) attempts to mitigate sequencing errors; UMI tagged molecules are PCR amplified then sequenced independently. Analyzing UMI tagged sequencing data requires clustering reads originating from the same molecule then error-correcting sequencing errors in these clusters. Sizes of the current datasets require this process to be resource-efficient. To address this problem, we introduce Calib, a computational tool that clusters and error-corrects UMI tagged sequencing data. Calib is efficient and its parameters have been optimized to different dataset setups. On simulated datasets, Calib is highly accurate. On a real dataset, Calib results in significantly reduced false positive rates in downstream variation calling.
Copyright statement
Copyright is held by the author.
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Scholarly level
Supervisor or Senior Supervisor
Thesis advisor: Chauve, Cedric
Member of collection
Attachment Size
etd19999.pdf 2.3 MB