Computing Science, School of

Receive updates for this collection

INGOT-DR: An Interpretable Classifier for Predicting Drug Resistance in M. Tuberculosis

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2021-08-10
Abstract: 

Motivation

Prediction of drug resistance and identification of its mechanisms in bacteria such as Mycobacterium tuberculosis, the etiological agent of tuberculosis, is a challenging problem. Solving this problem requires a transparent, accurate, and flexible predictive model. The methods currently used for this purpose rarely satisfy all of these criteria. On the one hand, approaches based on testing strains against a catalogue of previously identified mutations often yield poor predictive performance; on the other hand, machine learning techniques typically have higher predictive accuracy, but often lack interpretability and may learn patterns that produce accurate predictions for the wrong reasons. Current interpretable methods may either exhibit a lower accuracy or lack the flexibility needed to generalize them to previously unseen data.

Contribution

In this paper we propose a novel technique, inspired by group testing and Boolean compressed sensing, which yields highly accurate predictions, interpretable results, and is flexible enough to be optimized for various evaluation metrics at the same time.

Results

We test the predictive accuracy of our approach on five first-line and seven second-line antibiotics used for treating tuberculosis. We find that it has a higher or comparable accuracy to that of commonly used machine learning models, and is able to identify variants in genes with previously reported association to drug resistance. Our method is intrinsically interpretable, and can be customized for different evaluation metrics. Our implementation is available at github.com/hoomanzabeti/INGOT_DR and can be installed via The Python Package Index (Pypi) under ingotdr. This package is also compatible with most of the tools in the Scikit-learn machine learning library.

Document type: 
Article
File(s): 

A Deeper Look at Autonomous Vehicle Ethics: An Integrative Ethical Decision-Making Framework to Explain Moral Pluralism

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2021-05-04
Abstract: 

The autonomous vehicle (AV) is one of the first commercialized AI-embedded robots to make autonomous decisions. Despite technological advancements, unavoidable AV accidents that result in life-and-death consequences cannot be completely eliminated. The emerging social concern of how an AV should make ethical decisions during unavoidable accidents is referred to as the moral dilemma of AV, which has promoted heated discussions among various stakeholders. However, there are research gaps in explainable AV ethical decision-making processes that predict how AVs’ moral behaviors are made that are acceptable from the AV users’ perspectives. This study addresses the key question: What factors affect ethical behavioral intentions in the AV moral dilemma? To answer this question, this study draws theories from multidisciplinary research fields to propose the “Integrative ethical decision-making framework for the AV moral dilemma.” The framework includes four interdependent ethical decision-making stages: AV moral dilemma issue framing, intuitive moral reasoning, rational moral reasoning, and ethical behavioral intention making. Further, the framework includes variables (e.g., perceived moral intensity, individual factors, and personal moral philosophies) that influence the ethical decision-making process. For instance, the framework explains that AV users from Eastern cultures will tend to endorse a situationist ethics position (high idealism and high relativism), which views that ethical decisions are relative to context, compared to AV users from Western cultures. This proposition is derived from the link between individual factors and personal moral philosophy. Moreover, the framework proposes a dual-process theory, which explains that both intuitive and rational moral reasoning are integral processes of ethical decision-making during the AV moral dilemma. Further, this framework describes that ethical behavioral intentions that lead to decisions in the AV moral dilemma are not fixed, but are based on how an individual perceives the seriousness of the situation, which is shaped by their personal moral philosophy. This framework provides a step-by-step explanation of how pluralistic ethical decision-making occurs, reducing the abstractness of AV moral reasoning processes.

Document type: 
Article
File(s): 

Predicting the Clinical Management of Skin Lesions Using Deep Learning

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2021-04-08
Abstract: 

Automated machine learning approaches to skin lesion diagnosis from images are approaching dermatologist-level performance. However, current machine learning approaches that suggest management decisions rely on predicting the underlying skin condition to infer a management decision without considering the variability of management decisions that may exist within a single condition. We present the first work to explore image-based prediction of clinical management decisions directly without explicitly predicting the diagnosis. In particular, we use clinical and dermoscopic images of skin lesions along with patient metadata from the Interactive Atlas of Dermoscopy dataset (1011 cases; 20 disease labels; 3 management decisions) and demonstrate that predicting management labels directly is more accurate than predicting the diagnosis and then inferring the management decision (13.73±3.93% and 6.59±2.86% improvement in overall accuracy and AUROC respectively), statistically significant at p<0.001. Directly predicting management decisions also considerably reduces the over-excision rate as compared to management decisions inferred from diagnosis predictions (24.56% fewer cases wrongly predicted to be excised). Furthermore, we show that training a model to also simultaneously predict the seven-point criteria and the diagnosis of skin lesions yields an even higher accuracy (improvements of 4.68±1.89% and 2.24±2.04% in overall accuracy and AUROC respectively) of management predictions. Finally, we demonstrate our model’s generalizability by evaluating on the publicly available MClass-D dataset and show that our model agrees with the clinical management recommendations of 157 dermatologists as much as they agree amongst each other.

Document type: 
Article
File(s): 

Introduction to Causality

Peer reviewed: 
No, item is not peer reviewed.
Date created: 
2020-12-18
Abstract: 

In scientific discovery, engineering, imaging, and machine learning, it is often critical to understand what causes an event or observation, rather than focusing on correlation/association alone. In order to make this complex topic more accessible I would like to share what I learned on how causality can be applied and what concepts are essential in doing so. I will introduce the graphical causal model (Bayesian network), and show how you can translate human intuition on causality into formal axioms that fuse the causal graph with the probability space from observed events. We will discuss counterfactual causality, and end with an overview of recommendations on how to use causality in practice as well as current open issues and relevant papers that tackle those questions.

Document type: 
Lecture / Talk
File(s): 

HASLR: Fast Hybrid Assembly of Long Reads

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2020-08-21
Abstract: 

Third-generation sequencing technologies from companies such as Oxford Nanopore and Pacific Biosciences have paved the way for building more contiguous and potentially gap-free assemblies. The larger effective length of their reads has provided a means to overcome the challenges of short to mid-range repeats. Currently, accurate long read assemblers are computationally expensive, whereas faster methods are not as accurate. Moreover, despite recent advances in third-generation sequencing, researchers still tend to generate accurate short reads formany of the analysis tasks. Here,we present HASLR, a hybrid assembler that uses error-prone long reads together with high-quality short reads to efficiently generate accurate genome assemblies. Our experiments show that HASLR is not only the fastest assembler but also the one with the lowest number of misassemblies on most of the samples, while being on par with other assemblers in terms of contiguity and accuracy.

Document type: 
Article
File(s): 

Deep Learning for Edge Computing Applications: A State-of-the-Art Survey

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2020-03-23
Abstract: 

With the booming development of Internet-of-Things (IoT) and communication technologies such as 5G, our future world is envisioned as an interconnected entity where billions of devices will provide uninterrupted service to our daily lives and the industry. Meanwhile, these devices will generate massive amounts of valuable data at the network edge, calling for not only instant data processing but also intelligent data analysis in order to fully unleash the potential of the edge big data. Both the traditional cloud computing and on-device computing cannot sufficiently address this problem due to the high latency and the limited computation capacity, respectively. Fortunately, the emerging edge computing sheds a light on the issue by pushing the data processing from the remote network core to the local network edge, remarkably reducing the latency and improving the efficiency. Besides, the recent breakthroughs in deep learning have greatly facilitated the data processing capacity, enabling a thrilling development of novel applications, such as video surveillance and autonomous driving. The convergence of edge computing and deep learning is believed to bring new possibilities to both interdisciplinary researches and industrial applications. In this article, we provide a comprehensive survey of the latest efforts on the deep-learning-enabled edge computing applications and particularly offer insights on how to leverage the deep learning advances to facilitate edge applications from four domains, i.e., smart multimedia, smart transportation, smart city, and smart industry. We also highlight the key research challenges and promising research directions therein. We believe this survey will inspire more researches and contributions in this promising field.

Document type: 
Article
File(s): 

Deconvoluting the Diversity of Within-host Pathogen Strains in a Multi-locus Sequence Typing Framework

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2019-12-17
Abstract: 

Background

Bacterial pathogens exhibit an impressive amount of genomic diversity. This diversity can be informative of evolutionary adaptations, host-pathogen interactions, and disease transmission patterns. However, capturing this diversity directly from biological samples is challenging.

Results

We introduce a framework for understanding the within-host diversity of a pathogen using multi-locus sequence types (MLST) from whole-genome sequencing (WGS) data. Our approach consists of two stages. First we process each sample individually by assigning it, for each locus in the MLST scheme, a set of alleles and a proportion for each allele. Next, we associate to each sample a set of strain types using the alleles and the strain proportions obtained in the first step. We achieve this by using the smallest possible number of previously unobserved strains across all samples, while using those unobserved strains which are as close to the observed ones as possible, at the same time respecting the allele proportions as closely as possible. We solve both problems using mixed integer linear programming (MILP). Our method performs accurately on simulated data and generates results on a real data set of Borrelia burgdorferi genomes suggesting a high level of diversity for this pathogen.

Conclusions

Our approach can apply to any bacterial pathogen with an MLST scheme, even though we developed it with Borrelia burgdorferi, the etiological agent of Lyme disease, in mind. Our work paves the way for robust strain typing in the presence of within-host heterogeneity, overcoming an essential challenge currently not addressed by any existing methodology for pathogen genomics.

Document type: 
Article
File(s): 

A New Resolution Function to Evaluate Tree Shape Statistics

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2019-11-21
Abstract: 

Phylogenetic trees are frequently used in biology to study the relationships between a number of species or organisms. The shape of a phylogenetic tree contains useful information about patterns of speciation and extinction, so powerful tools are needed to investigate the shape of a phylogenetic tree. Tree shape statistics are a common approach to quantifying the shape of a phylogenetic tree by encoding it with a single number. In this article, we propose a new resolution function to evaluate the power of different tree shape statistics to distinguish between dissimilar trees. We show that the new resolution function requires less time and space in comparison with the previously proposed resolution function for tree shape statistics. We also introduce a new class of tree shape statistics, which are linear combinations of two existing statistics that are optimal with respect to a resolution function, and show evidence that the statistics in this class converge to a limiting linear combination as the size of the tree increases. Our implementation is freely available at https://github.com/WGS-TB/TreeShapeStats.

Document type: 
Article
File(s): 

On the Rank-Distance Median of 3 Permutations

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2018-05-08
Abstract: 

Background

Recently, Pereira Zanetti, Biller and Meidanis have proposed a new definition of a rearrangement distance between genomes. In this formulation, each genome is represented as a matrix, and the distance d is the rank distance between these matrices. Although defined in terms of matrices, the rank distance is equal to the minimum total weight of a series of weighted operations that leads from one genome to the other, including inversions, translocations, transpositions, and others. The computational complexity of the median-of-three problem according to this distance is currently unknown. The genome matrices are a special kind of permutation matrices, which we study in this paper.

In their paper, the authors provide an O(n3)">O(n3)O(n3) algorithm for determining three candidate medians, prove the tight approximation ratio 43">4343, and provide a sufficient condition for their candidates to be true medians. They also conduct some experiments that suggest that their method is accurate on simulated and real data.

Results

In this paper, we extend their results and provide the following:

  • Three invariants characterizing the problem of finding the median of 3 matrices

  • A sufficient condition for uniqueness of medians that can be checked in O(n)

  • A faster, O(n2)">O(n2)O(n2) algorithm for determining the median under this condition

  • A new heuristic algorithm for this problem based on compressed sensing

  • A O(n4)">O(n4)O(n4) algorithm that exactly solves the problem when the inputs are orthogonal matrices, a class that includes both permutations and genomes as special cases.

Conclusions

Our work provides the first proof that, with respect to the rank distance, the problem of finding the median of 3 genomes, as well as the median of 3 permutations, is exactly solvable in polynomial time, a result which should be contrasted with its NP-hardness for the DCJ (double cut-and-join) distance and most other families of genome rearrangement operations. This result, backed by our experimental tests, indicates that the rank distance is a viable alternative to the DCJ distance widely used in genome comparisons.

Document type: 
Article
File(s): 

Smart Home Based on WiFi Sensing: A Survey

Peer reviewed: 
Yes, item is peer reviewed.
Date created: 
2018-03-07
Abstract: 

Conventional sensing methodologies for smart home are known to be labor-intensive and complicated for practical deployment. Thus, researchers are resorting to alternative sensing mechanisms. Wi-Fi is one of the key technologies that enable connectivity for smart home services. Apart from its primary use for communication, Wi-Fi signal has now been widely leveraged for various sensing tasks, such as gesture recognition and fall detection, due to its sensitivity to environmental dynamics. Building smart home based on Wi-Fi sensing is cost-effective, non-invasive, and enjoys convenient deployment. In this paper, we survey the recent advances in the smart home systems based on the Wi-Fi sensing, mainly in such areas as health monitoring, gesture recognition, contextual information acquisition, and authentication.

Document type: 
Article
File(s):