The use of circulating tumour DNA (ctDNA) in cancer oncogenomics has the potential for rapid and non-invasive monitoring of patient-specific tumour progression. However, detection of low allele frequency variations in ctDNA raises many challenges, including the handling of sequencing errors. Tagging of DNA molecules with Unique Molecular Identifiers (UMI) attempts to mitigate sequencing errors; UMI tagged molecules are PCR amplified then sequenced independently. Analyzing UMI tagged sequencing data requires clustering reads originating from the same molecule then error-correcting sequencing errors in these clusters. Sizes of the current datasets require this process to be resource-efficient. To address this problem, we introduce Calib, a computational tool that clusters and error-corrects UMI tagged sequencing data. Calib is efficient and its parameters have been optimized to different dataset setups. On simulated datasets, Calib is highly accurate. On a real dataset, Calib results in significantly reduced false positive rates in downstream variation calling.
Copyright is held by the author.
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Chauve, Cedric
Member of collection