Skip to main content

Resolving complexity of within-host pathogen strain diversity and antimicrobial resistance using optimization and machine learning

Resource type
Thesis type
(Thesis) M.Sc.
Date created
Hosts such as humans and arthropods are often infected by diverse bacterial populations and such heterogeneity poses a great threat in effective treatments. Moreover, the development of antimicrobial resistance further aggravates the efficacy of treatment. In this thesis, we present work towards solving two crucial problems in the realm of epidemiology, the within-host strain diversity problem and the antimicrobial resistance problem. For the strain diversity problem, we resolved complex infections and co-infections pattern within a sample by developing a two-stage optimization pipeline using integer linear programming. Results on simulated data showed that our methods achieved a precision of at least 92% and a recall of at least 90% for the first-stage of our pipeline, and a soft-precision of at least 68% and a soft-recall of at least 79% for the second-stage. Soft-precision and soft-recall are similar to precision and recall but less stringent. This demonstrates that our methods are able to predict strains which are extremely close to the true strains and are comparable to or better than existing tools. For the antimicrobial resistance problem, we explored different machine learning models to classify the response of a pathogen to a single drug, where we specifically focused on analyzing the robustness of the models and deriving resistance-related markers from them. The models achieved at least 85% test AUC (Area under the ROC curve) for first-line drugs and at least 76% for second-line drugs. Our results showed that the models are robust given datasets of varying diversity, and the resistance-related markers which were frequently reported in the literature were highly ranked by the models. Our work on both problems paves the way for robust strain typing in the presence of within-host heterogeneity and effective drug resistance diagnosis, overcoming essential challenges to solving both problems which are currently not addressed by any existing methodology for pathogen genomics.
Copyright statement
Copyright is held by the author.
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Scholarly level
Supervisor or Senior Supervisor
Thesis advisor: Chindelevitch, Leonid
Member of collection
Download file Size
etd19883.pdf 14.89 MB

Views & downloads - as of June 2023

Views: 0
Downloads: 0