Unsupervised continuous feature annotation of the human genome

Date created: 
2020-04-14
Identifier: 
etd20807
Keywords: 
Unsupervised learning
Epigenomics
Sequencing-based assays
Genome annotation
Human genome
Continuous modelling.
Abstract: 

Genome annotation methods are widely used to understand the function of the genome. For example, they can be used to identify the activity of a genomic position that is associated with a disease. Existing genome annotation methods produce discrete annotations that assign a single label to each genomic position. However, these discrete annotation methods have several limitations. For example, these methods cannot easily represent varying strengths of genomic elements, and they cannot easily represent combinatorial elements that simultaneously exhibit multiple types of activity. To remedy these limitations, an annotation strategy is proposed that instead outputs a vector of chromatin state features at each position. Also a method, epigenome-ssm is proposed to annotate the genome with chromatin state features. It is shown that chromatin state features from epigenome-ssm are more useful for several downstream applications than both continuous and discrete alternatives, including their ability to identify expressed genes and enhancers.

Document type: 
Thesis
Rights: 
This thesis may be printed or downloaded for non-commercial research and scholarly purposes. Copyright remains with the author.
File(s): 
Supervisor(s): 
Kay C Wiese
Maxwell Libbrecht
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.
Statistics: