Application of linear and integer programming to three challenging problems in computational biology

Resource type
Thesis type
(Thesis) Ph.D.
Date created
2021-08-03
Authors/Contributors
Abstract
Linear Programming (LP) and Integer Linear Programming (ILP) have increasingly been used in computational and systems biology methods in the past 24 years. From RNA and protein structure prediction to analyzing biological networks, ILP and ILP-based methods provide natural, easy to maintain, and extendable solutions for many NP-hard biological optimization problems. This thesis aims to provide solutions to three challenging problems in system biology, infectious disease, and epidemiology. First, we present a four-step framework to verify and diagnose elemental balance violation in metabolic networks. Identifying such violations can be specifically challenging since chemical formulas of the metabolites in a metabolic network are often partially or entirely left unspecified. However, our framework is able to detect such violations efficiently and makes suggestions for correction without the need for specifying the chemical formula for each metabolite. We have applied our framework to a collection of 94 previously published metabolic network models and successfully detected elemental balance violations in 46 of them. Next, we introduce INGOT-DR, an interpretable classifier for predicting drug resistance. Our classifier utilizes group testing and Boolean compressed sensing to provide highly accurate and interpretable predictions, which could be helpful to investigate the mechanism of drug resistance in pathogenic bacteria such as Mycobacterium tuberculosis. Our method is also flexible enough to be optimized for various evaluation metrics at the same time. INGOT- DR has been tested for predicting drug resistance on five first-line and seven second-line antibiotics used for treating tuberculosis and showed higher or comparable accuracy to commonly used machine learning models for phenotype-genotype prediction. Our method was also able to identify variants located in genes previously reported to be associated with drug resistance. Finally, we present GroupTesing, a modular software platform for a comprehensive evaluation of non-adaptive group testing strategies. This software can perform the evaluation in both a noiseless setting and in the presence of single or multiple realistic noise sources modeled on published experimental observations, which makes them applicable to polymerase chain reaction (PCR) tests, the dominant type of tests for SARS-CoV-2.
Document
Extent
104 pages.
Identifier
etd21573
Copyright statement
Copyright is held by the author(s).
Permissions
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Chindelevitch, Leonid
Language
English
Member of collection
Attachment Size
etd21573.pdf 7.14 MB