Resource type
Thesis type
(Thesis) M.Sc.
Date created
2019-11-29
Authors/Contributors
Author: Katebi, Mohsen
Abstract
In this work, we study the problem of clustering bacterial isolates into epidemiologically related groups from next-generation sequencing data. Existing methods for this problem mainly use a single genotyping signal, and either use a distance-based method with a pre-specified number of clusters, or a phylogenetic tree-based method with a pre-specified threshold. We propose PathOGiST, an open-source algorithmic framework for clustering bacterial isolates by leveraging multiple genotypic signals and calibrated thresholds. PathOGiST uses different genotypic signals, clusters the isolates based on these individual signals with correlation clustering, and combines the clusterings based on the individual signals with consensus clustering. We implemented and tested PathOGiST on three different bacterial pathogens - Escherichia coli, Yersinia pseudotuberculosis, and Mycobacterium tuberculosis - and found that it outperforms most existing methods. We conclude by discussing how our framework can be extended and some of the challenges that remain to be addressed.
Document
Identifier
etd20726
Copyright statement
Copyright is held by the author.
Scholarly level
Supervisor or Senior Supervisor
Thesis advisor: Chindelevitch, Leonid
Member of collection
Download file | Size |
---|---|
etd20726.pdf | 728.79 KB |