Evaluating the reproducibility of segmentation and genome annotation (SAGA) algorithms

Resource type
Thesis type
(Thesis) M.Sc.
Date created
2022-07-26
Authors/Contributors
Abstract
Segmentation and genome annotation (SAGA) algorithms such as ChromHMM and Segway are widely used for genome annotation using epigenomic datasets. These algorithms rely on probabilistic graphical models and take as input a collection of genomics datasets, partition the genome, and assign a label to each segment such that positions with the same label have similar patterns in the input data and output an annotation that assigns to each genomic position its annotated activity, such as Enhancer, Transcribed, etc. Despite the widespread applications of SAGA methods, there is currently no principled way to evaluate the statistical significance of SAGA label assignments. In this study, we are applying principles of reproducibility analysis to assess the statistical significance and the confidence that is to be ascribed to the genome annotations obtained from SAGA algorithms. Moreover, by investigating various individual variables that affect reproducibility, we try to delineate different sources of irreproducibility in genome annotations. We hypothesize that reproducibility measurements provide more realistic confidence estimates of the SAGA annotations, which will uncover irreproducible elements in existing annotations and remove doubt in those that stand up to this statistical scrutiny.
Document
Extent
55 pages.
Identifier
etd22155
Copyright statement
Copyright is held by the author(s).
Permissions
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Libbrecht, Maxwell
Language
English
Member of collection
Attachment Size
etd22155.pdf 18.98 MB