Genome annotation methods are widely used to understand the function of the genome. For example, they can be used to identify the activity of a genomic position that is associated with a disease. Existing genome annotation methods produce discrete annotations that assign a single label to each genomic position. However, these discrete annotation methods have several limitations. For example, these methods cannot easily represent varying strengths of genomic elements, and they cannot easily represent combinatorial elements that simultaneously exhibit multiple types of activity. To remedy these limitations, an annotation strategy is proposed that instead outputs a vector of chromatin state features at each position. Also a method, epigenome-ssm is proposed to annotate the genome with chromatin state features. It is shown that chromatin state features from epigenome-ssm are more useful for several downstream applications than both continuous and discrete alternatives, including their ability to identify expressed genes and enhancers.
Copyright is held by the author.
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Member of collection