Foroozandeh Shahraki, Mehdi

Resource type

Thesis

Thesis type

(Thesis) M.Sc.

Date created

2022-07-26

Authors/Contributors

Author: Foroozandeh Shahraki, Mehdi

Abstract

Segmentation and genome annotation (SAGA) algorithms such as ChromHMM and Segway are widely used for genome annotation using epigenomic datasets. These algorithms rely on probabilistic graphical models and take as input a collection of genomics datasets, partition the genome, and assign a label to each segment such that positions with the same label have similar patterns in the input data and output an annotation that assigns to each genomic position its annotated activity, such as Enhancer, Transcribed, etc. Despite the widespread applications of SAGA methods, there is currently no principled way to evaluate the statistical significance of SAGA label assignments. In this study, we are applying principles of reproducibility analysis to assess the statistical significance and the confidence that is to be ascribed to the genome annotations obtained from SAGA algorithms. Moreover, by investigating various individual variables that affect reproducibility, we try to delineate different sources of irreproducibility in genome annotations. We hypothesize that reproducibility measurements provide more realistic confidence estimates of the SAGA annotations, which will uncover irreproducible elements in existing annotations and remove doubt in those that stand up to this statistical scrutiny.

Extent

55 pages.

Keywords

Identifier

etd22155

Copyright statement

Copyright is held by the author(s).

Permissions

This thesis may be printed or downloaded for non-commercial research and scholarly purposes.

Supervisor or Senior Supervisor

Thesis advisor: Libbrecht, Maxwell

Language

English

Member of collection

Computing Science Theses

Download file	Size
etd22155.pdf	18.98 MB

Evaluating the reproducibility of segmentation and genome annotation (SAGA) algorithms

Keywords

Views & downloads - as of June 2023