Genotyping and copy number analysis of immunoglobin heavy chain variable genes using long reads

Author: 
Date created: 
2019-08-21
Identifier: 
etd20586
Keywords: 
Convex optimization
Integer Linear Proramming
Genomics
Immunoglobin Heavy Chain Variable Genes
Genotyping
Copy Number Analysis
Long Reads
Allele Assignment Problem
Abstract: 

One of the remaining challenges to describing an individual's genetic variation lies in the highly heterogenous and complex genomic regions which imped the use of classical reference-guided mapping and assembly approaches. Once such region is the Immunoglobulin heavy chain locus (IGH), which is critical for the development of antibodies and the immune system. Presented is ImmunoTyper, the first PacBio-based genotyping and copy-number calling tool specifically designed for IGH V genes (IGHV). ImmunoTyper's multi-stage clustering and combinatorial optimization approach is demonstrated to be the most comprehensive IGHV genotyping approach published to date, through validation using gold-standard IGH reference sequence. This preliminary work establishes the feasibility of fine-grained genotype and copy number analysis using error-prone long reads in complex multi-gene loci, and opens the door for in-depth investigation into IGHV heterogeneity using accessible and increasingly common whole genome sequence

Document type: 
Thesis
Rights: 
This thesis may be printed or downloaded for non-commercial research and scholarly purposes. Copyright remains with the author.
File(s): 
Senior supervisor: 
Maxwell Libbrecht
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.
Statistics: