# Computing Science, School of

## On the Rank-Distance Median of 3 Permutations

Author:
Peer reviewed:
Yes, item is peer reviewed.
Date created:
2018-05-08
Abstract:

Background

Recently, Pereira Zanetti, Biller and Meidanis have proposed a new definition of a rearrangement distance between genomes. In this formulation, each genome is represented as a matrix, and the distance d is the rank distance between these matrices. Although defined in terms of matrices, the rank distance is equal to the minimum total weight of a series of weighted operations that leads from one genome to the other, including inversions, translocations, transpositions, and others. The computational complexity of the median-of-three problem according to this distance is currently unknown. The genome matrices are a special kind of permutation matrices, which we study in this paper.

In their paper, the authors provide an O(n3)">O(n3)O(n3) algorithm for determining three candidate medians, prove the tight approximation ratio 43">4343, and provide a sufficient condition for their candidates to be true medians. They also conduct some experiments that suggest that their method is accurate on simulated and real data.

Results

In this paper, we extend their results and provide the following:

• Three invariants characterizing the problem of finding the median of 3 matrices

• A sufficient condition for uniqueness of medians that can be checked in O(n)

• A faster, O(n2)">O(n2)O(n2) algorithm for determining the median under this condition

• A new heuristic algorithm for this problem based on compressed sensing

• A O(n4)">O(n4)O(n4) algorithm that exactly solves the problem when the inputs are orthogonal matrices, a class that includes both permutations and genomes as special cases.

Conclusions

Our work provides the first proof that, with respect to the rank distance, the problem of finding the median of 3 genomes, as well as the median of 3 permutations, is exactly solvable in polynomial time, a result which should be contrasted with its NP-hardness for the DCJ (double cut-and-join) distance and most other families of genome rearrangement operations. This result, backed by our experimental tests, indicates that the rank distance is a viable alternative to the DCJ distance widely used in genome comparisons.

Document type:
Article
File(s):

## Smart Home Based on WiFi Sensing: A Survey

Author:
Peer reviewed:
Yes, item is peer reviewed.
Date created:
2018-03-07
Abstract:

Conventional sensing methodologies for smart home are known to be labor-intensive and complicated for practical deployment. Thus, researchers are resorting to alternative sensing mechanisms. Wi-Fi is one of the key technologies that enable connectivity for smart home services. Apart from its primary use for communication, Wi-Fi signal has now been widely leveraged for various sensing tasks, such as gesture recognition and fall detection, due to its sensitivity to environmental dynamics. Building smart home based on Wi-Fi sensing is cost-effective, non-invasive, and enjoys convenient deployment. In this paper, we survey the recent advances in the smart home systems based on the Wi-Fi sensing, mainly in such areas as health monitoring, gesture recognition, contextual information acquisition, and authentication.

Document type:
Article
File(s):

## A Study of Chained Stochastic Tracking in RGB and Depth Sensing

Author:
Peer reviewed:
Yes, item is peer reviewed.
Date created:
2018-01-30
Abstract:

This paper studies the notion of hierarchical (chained) structure of stochastic tracking of marked feature points while a person is moving in the field of view of a RGB and depth sensor. The objective is to explore how the information between the two sensing modalities (namely, RGB sensing and depth sensing) can be cascaded in order to distribute and share the implicit knowledge associated with the tracking environment. In the first layer, the prior estimate of the state of the object is distributed based on the novel expected motion constraints approach associated with the movements. For the second layer, the segmented output resulting from the RGB image is used for tracking marked feature points of interest in the depth image of the person. Here we proposed two approaches for associating a measure (weight) for the distribution of the estimates (particles) of the tracking feature points using depth data. The first measure is based on the notion of spin-image and the second is based on the geodesic distance. The paper presents the overall implementation of the proposed method combined with some case study results.

Document type:
Article
File(s):

## Uncovering the Subtype-Specific Temporal Order of Cancer Pathway Dysregulation

Author:
Peer reviewed:
Yes, item is peer reviewed.
Date created:
2019-11-11
Abstract:

Cancer is driven by genetic mutations that dysregulate pathways important for proper cell function. Therefore, discovering these cancer pathways and their dysregulation order is key to understanding and treating cancer. However, the heterogeneity of mutations between different individuals makes this challenging and requires that cancer progression is studied in a subtype-specific way. To address this challenge, we provide a mathematical model, called Subtype-specific Pathway Linear Progression Model (SPM), that simultaneously captures cancer subtypes and pathways and order of dysregulation of the pathways within each subtype. Experiments with synthetic data indicate the robustness of SPM to problem specifics including noise compared to an existing method. Moreover, experimental results on glioblastoma multiforme and colorectal adenocarcinoma show the consistency of SPM’s results with the existing knowledge and its superiority to an existing method in certain cases. The implementation of our method is available at https://github.com/Dalton386/SPM.

Document type:
Article
File(s):

## Datasets for Face and Object Detection in Fisheye Images

Author:
Peer reviewed:
Yes, item is peer reviewed.
Date created:
2019-11-02
Abstract:

We present two new fisheye image datasets for training object and face detection models: VOC-360 and Wider-360. The fisheye images are created by post-processing regular images collected from two well-known datasets, VOC2012 and Wider Face, using a model for mapping regular to fisheye images implemented in Matlab. VOC-360 contains 39,575 fisheye images for object detection, segmentation, and classification. Wider-360 contains 63,897 fisheye images for face detection. These datasets will be useful for developing face and object detectors as well as segmentation modules for fisheye images while the efforts to collect and manually annotate true fisheye images are underway.

Document type:
Article
File(s):

## Caveolae and Scaffold Detection from Single Molecule Localization Microscopy Data Using Deep Learning

Author:
Peer reviewed:
Yes, item is peer reviewed.
Date created:
2019-08-26
Abstract:

Caveolae are plasma membrane invaginations whose formation requires caveolin-1 (Cav1), the adaptor protein polymerase I, and the transcript release factor (PTRF or CAVIN1). Caveolae have an important role in cell functioning, signaling, and disease. In the absence of CAVIN1/PTRF, Cav1 forms non-caveolar membrane domains called scaffolds. In this work, we train machine learning models to automatically distinguish between caveolae and scaffolds from single molecule localization microscopy (SMLM) data. We apply machine learning algorithms to discriminate biological structures from SMLM data. Our work is the first that is leveraging machine learning approaches (including deep learning models) to automatically identifying biological structures from SMLM data. In particular, we develop and compare three binary classification methods to identify whether or not a given 3D cluster of Cav1 proteins is a caveolae. The first uses a random forest classifier applied to 28 hand-crafted/designed features, the second uses a convolutional neural net (CNN) applied to a projection of the point clouds onto three planes, and the third uses a PointNet model, a recent development that can directly take point clouds as its input. We validate our methods on a dataset of super-resolution microscopy images of PC3 prostate cancer cells labeled for Cav1. Specifically, we have images from two cell populations: 10 PC3 and 10 CAVIN1/PTRF-transfected PC3 cells (PC3-PTRF cells) that form caveolae. We obtained a balanced set of 1714 different cellular structures. Our results show that both the random forest on hand-designed features and the deep learning approach achieve high accuracy in distinguishing the intrinsic features of the caveolae and non-caveolae biological structures. More specifically, both random forest and deep CNN classifiers achieve classification accuracy reaching 94% on our test set, while the PointNet model only reached 83% accuracy. We also discuss the pros and cons of the different approaches.

Document type:
Article
File(s):

## A Multi-Labeled Tree Dissimilarity Measure for Comparing “Clonal Trees” of Tumor Progression

Author:
Peer reviewed:
Yes, item is peer reviewed.
Date created:
2019-07-27
Abstract:

We introduce a new dissimilarity measure between a pair of “clonal trees”, each representing the progression and mutational heterogeneity of a tumor sample, constructed by the use of single cell or bulk high throughput sequencing data. In a clonal tree, each vertex represents a specific tumor clone, and is labeled with one or more mutations in a way that each mutation is assigned to the oldest clone that harbors it. Given two clonal trees, our multi-labeled tree dissimilarity (MLTD) measure is defined as the minimum number of mutation/label deletions, (empty) leaf deletions, and vertex (clonal) expansions, applied in any order, to convert each of the two trees to the maximum common tree. We show that the MLTD measure can be computed efficiently in polynomial time and it captures the similarity between trees of different clonal granularity well.

Document type:
Article
File(s):

## A Cubic Algorithm for the Generalized Rank Median of Three Genomes

Author:
Peer reviewed:
Yes, item is peer reviewed.
Date created:
2018-09-08
Abstract:

The area of genome rearrangements has given rise to a number of interesting biological, mathematical and algorithmic problems. Among these, one of the most intractable ones has been that of finding the median of three genomes, a special case of the ancestral reconstruction problem. In this work we re-examine our recently proposed way of measuring genome rearrangement distance, namely, the rank distance between the matrix representations of the corresponding genomes, and show that the median of three genomes can be computed exactly in polynomial time   O(nω) , where   ω≤3 , with respect to this distance, when the median is allowed to be an arbitrary orthogonal matrix.

We define the five fundamental subspaces depending on three input genomes, and use their properties to show that a particular action on each of these subspaces produces a median. In the process we introduce the notion of M-stable subspaces. We also show that the median found by our algorithm is always orthogonal, symmetric, and conserves any adjacencies or telomeres present in at least 2 out of 3 input genomes.

We test our method on both simulated and real data. We find that the majority of the realistic inputs result in genomic outputs, and for those that do not, our two heuristics perform well in terms of reconstructing a genomic matrix attaining a score close to the lower bound, while running in a reasonable amount of time. We conclude that the rank distance is not only theoretically intriguing, but also practically useful for median-finding, and potentially ancestral genome reconstruction.

Document type:
Article
File(s):

## jViz.RNA 4.0—Visualizing Pseudoknots and RNA Editing Employing Compressed Tree Graphs

Author:
Peer reviewed:
Yes, item is peer reviewed.
Date created:
2019-05-06
Abstract:

Previously, we have introduced an improved version of jViz.RNA which enabled faster and more stable RNA visualization by employing compressed tree graphs. However, the new RNA representation and visualization method required a sophisticated mechanism of pseudoknot visualization. In this work, we present our novel pseudoknot classification and implementation of pseudoknot visualization in the context of the new RNA graph model. We then compare our approach with other RNA visualization software, and demonstrate jViz.RNA 4.0’s benefits compared to other software. Additionally, we introduce interactive editing functionality into jViz.RNA and demonstrate its benefits in exploring and building RNA structures. The results presented highlight the new high degree of utility jViz.RNA 4.0 now offers. Users are now able to visualize pseudoknotted RNA, manipulate the resulting automatic layouts to suit their individual needs, and change both positioning and connectivity of the RNA molecules examined. Care was taken to limit overlap between structural elements, particularly in the case of pseudoknots to ensure an intuitive and informative layout of the final RNA structure.

Document type:
Article
File(s):

## Integrative Inference of Subclonal Tumour Evolution from Single-Cell and Bulk Sequencing Data

Author:
Peer reviewed:
Yes, item is peer reviewed.
Date created:
2019-06-21
Abstract:

Understanding the clonal architecture and evolutionary history of a tumour poses one of the key challenges to overcome treatment failure due to resistant cell populations. Previously, studies on subclonal tumour evolution have been primarily based on bulk sequencing and in some recent cases on single-cell sequencing data. Either data type alone has shortcomings with regard to this task, but methods integrating both data types have been lacking. Here, we present B-SCITE, the first computational approach that infers tumour phylogenies from combined single-cell and bulk sequencing data. Using a comprehensive set of simulated data, we show that B-SCITE systematically outperforms existing methods with respect to tree reconstruction accuracy and subclone identification. B-SCITE provides high-fidelity reconstructions even with a modest number of single cells and in cases where bulk allele frequencies are affected by copy number changes. On real tumour data, B-SCITE generated mutation histories show high concordance with expert generated trees.

Document type:
Article
File(s):