Skip to main content

PathOGiST: a novel method for clustering pathogen isolates by combining multiple genotyping signals

Resource type
Thesis type
(Thesis) M.Sc.
Date created
2019-11-29
Authors/Contributors
Abstract
In this work, we study the problem of clustering bacterial isolates into epidemiologically related groups from next-generation sequencing data. Existing methods for this problem mainly use a single genotyping signal, and either use a distance-based method with a pre-specified number of clusters, or a phylogenetic tree-based method with a pre-specified threshold. We propose PathOGiST, an open-source algorithmic framework for clustering bacterial isolates by leveraging multiple genotypic signals and calibrated thresholds. PathOGiST uses different genotypic signals, clusters the isolates based on these individual signals with correlation clustering, and combines the clusterings based on the individual signals with consensus clustering. We implemented and tested PathOGiST on three different bacterial pathogens - Escherichia coli, Yersinia pseudotuberculosis, and Mycobacterium tuberculosis - and found that it outperforms most existing methods. We conclude by discussing how our framework can be extended and some of the challenges that remain to be addressed.
Document
Identifier
etd20726
Copyright statement
Copyright is held by the author.
Permissions
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Scholarly level
Supervisor or Senior Supervisor
Thesis advisor: Chindelevitch, Leonid
Member of collection
Download file Size
etd20726.pdf 728.79 KB

Views & downloads - as of June 2023

Views: 0
Downloads: 0