Computing Science - Theses, Dissertations, and other Required Graduate Degree Essays

Receive updates for this collection

Computational Discovery of Splicing Events from High-Throughput Omics Data

Author: 
Date created: 
2017-07-18
Abstract: 

The splicing mechanism, the process of forming mature messenger RNA (mRNA) by only concatenating exons and removing introns, is an essential step in gene expression. It allows a single gene to have multiple RNA isoforms which potentially code different proteins. In addition, aberrant transcripts generated from non-canonical splicing events (e.g. gene fusions) are believed to be potential drivers in many tumor types and human diseases. Thus, identification and quantification of expressed RNAs from RNA-Seq data become fundamental steps in many clinical studies. For that reason, number of methods have been developed. Most popular computational methods designed for these high-throughput omics data start by analyzing the datasets based on existing gene annotations. However, these tools (i) do not detect novel RNA isoforms and low abundance transcripts; (ii) do not incorporate multi-mapping reads in their read counting strategies in quantifications; (iii) are sensitive to sequencing artifacts. In this thesis, we will address these computational problems for analyzing splicing events from high-throughput omics data. For identification and quantification of expressed RNAs from RNA-Seq data, we introduce CLIIQ, a unified framework to solve these two problems simultaneously. This framework also supports data from multiple samples to improve accuracy. To better incorporate multi-mapping reads into the framework, we design ORMAN, a combinatorial optimization formulation to resolve their mapping ambiguity by assigning single best location for each read. For aberrant transcript detections, we present a computational strategy ProTIE to integratively analyze proteomics and transcriptomic data from the same individual. This strategy provides proteome-level evidence for aberrant transcripts that can be used to eliminate false positives reported solely based on sequencing data.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Cenk Sahinalp
Martin Ester
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) Ph.D.

Feature-based Comparison of Flow Cytometry Data

Author: 
Date created: 
2017-08-08
Abstract: 

Flow cytometry (FCM) bioinformatics is a sub-field of bioinformatics, aimed at developing effective and efficient computational tools to store, organize, and analyze high-throughput/dimensional FCM data. Flow cytometers are capable of analyzing thousands of cells per second for up to 40 features. These features primarily signal the presence of different proteins on cells in the bloodstream. Hence contributing large amounts of data towards the big biological data paradigm. The data that a flow cytometer outputs from a biological sample, is called a FCS file.The International Mouse Phenotyping Consortium (IMPC) is a collaboration between 23 international institutions and funding organizations. Its aim is to decipher the function of 20,000 mouse genes. IMPC is doing so by breeding mice with a certain gene knocked out (KO), cancelling the function of that gene. In turn, FCM is used to measure the immunological changes correlated to this knockout. Many tools exist to classify FCS files. However, there is a lack of tools to conduct unsupervised clustering of FCS files. One goal of IMPC is to compare and contrast KO genes, hence IMPC becomes a prime motivation for this problem. As such, this thesis outlines a data processing pipeline used to isolate features for each FCS file. We then test the different types of features extracted on a benchmark data set from the FlowCAP-II challenge, containing data from healthy persons and patients with AML (acute myeloid leukemia). We then evaluate how well these features separate out FCS files of different origin (i.e. healthy vs AML).

Document type: 
Thesis
File(s): 
Senior supervisor: 
Cedric Chauve
Ryan Brinkman
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.

Machine Learning Driven Active Surfaces for 3D Segmentation of Tumour Lesions in PET Images

Author: 
Date created: 
2017-05-09
Abstract: 

One of the key challenges facing wider adoption of positron emission tomography (PET) as an imaging biomarker of disease is the development of reproducible quantitative image interpretation tools. Quantifying changes in tumor tissue, due to disease progression or treatment regimen, often requires accurate and reproducible delineation of lesions. Lesion segmentation is necessary for measuring tumor proliferation/shrinkage and radiotracer-uptake to quantify tumor metabolism.In this thesis, we develop an active surface model for segmenting lesions from PET images. We first implemented a non-convex level set active surface method with likelihood terms trained on manually-collected seeds points. We evaluated this approach on the following datasets: Images of phantoms collected by our collaborators at UBC; Quantitative Imaging Network (QIN) phantom images; and images of real patients from the QIN Head and Neck challenge. Secondly, to avoid user interaction, we developed an improved version of our method by training a machine learning system on anatomically and physiologically meaningful imaging cues to distinguish normal organ activity from tumorous lesion activity. Then, the inferred lesion likelihoods are used to guide a convex active surface segmentation model. The result is a lesion segmentation method that does not require user-initialization, manual seeding, or parameter-tweaking and, thus, guaranteeing reproducible results. We tested this enhanced method on data from the Cancer Imaging Archive. Our method not only produces more accurate segmentation than state-of-the-art segmentation results, but also does not need any user interaction.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Ghassan Hamarneh
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.

Event-based control as a cloud service

Date created: 
2017-04-19
Abstract: 

Event-based control has gained significant interest from the research community in recent years because it allows better resource utilization in networked control systems. In this thesis, we propose an architecture for offering event-based control as a service from the cloud, which not only improves resource utilization but also reduces the cost and setup time of large-scale industrial automation systems. Providing event-based control from the cloud, however, poses multiple research challenges. We address two of the main challenges, which are mitigating long and variable network delays and handling controller failures introduced because of moving the controller far away from the plant. We propose novel methods to solve the delay and failure problems and we show that these methods maintain the stability and performance of the control system. We implemented the proposed methods and deployed them on the Amazon cloud. Our results show that our delay mitigation technique can handle large communication delays up to several seconds with practically zero effect on the main performance metrics of the system. Moreover, the proposed fault tolerance approach can transparently handle controller failures even if the controlled system is thousands of miles away from its cloud controllers.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Mohamed Hefeeda
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.

Global Structure-from-Motion and Its Application

Author: 
Date created: 
2017-07-06
Abstract: 

Structure-from-motion (SfM) is a fundamental problem in 3D computer vision, with the aim of recovering camera poses and 3D scene structure simultaneously given a set of 2D images. SfM methods can be broadly divided into incremental and global methods according to their ways to register cameras. Incremental methods register cameras one by one, while global SfM methods solve all cameras simultaneously from all available relative motions. As a result, global SfM has better potential in both reconstruction accuracy and computation efficiency than incremental SfM. In this thesis, we address two challenges of global SfM. Our goal is to propose a robust and efficient global SfM system which is applicable to all kinds of motions and datasets. The first challenge is that translation averaging in global SfM is difficult, since the input relative motion between two cameras doesn’t encode the scale information. Therefore, many existing global SfM methods don’t work for the data whose measurement graph is not parallel rigid, e.g. all cameras on the same line. To tackle this challenge, we propose a global SfM method based on a novel linear relationship within camera triplets. Our formulation encodes the scale information by the baseline length ratios within the camera triplet, which helps deal with the collinear camera motion. We further extend the linear relationship within camera triplets to linear constraints for cameras seeing a common scene point, which can improve the global translation estimation for the data with weak image association. The second challenge is that global SfM methods are fragile on noisy data, and one incorrect pairwise relationship may distort the result greatly as global SfM considers all relative relationships together. To deal with this challenge, we propose a novel global SfM pipeline where camera registration is formulated as a well-posed similarity averaging problem solved robustly with L1 optimization. What’s more, the novel pipeline makes the filtering of noisy relative poses simple and effective, which can further improve the robustness of global SfM. We show the effectiveness of our global SfM system by applying it into the video alignment problem which aims to find per-pixel correspondences between two video sequences in both spatial and temporal dimensions. Guided by the 3D information from global SfM, the proposed video registration method can align videos taken at different times with substantially different appearances, in the presence of moving objects and moving cameras with slightly different trajectories.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Ping Tan
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) Ph.D.

Recommendation in Social Media: Utilizing Relationships among Users to Enhance Personalized Recommendation

Author: 
Date created: 
2017-06-29
Abstract: 

Recommender systems are ubiquitous in our digital life in recent years. They play a significant role in numerous Internet services and applications such as electronic commerce (Amazon and eBay), on-demand video streaming (Netflix and Hulu). A key task in recommender systems is to model user preferences and to suggest, for each user, a personalized list of items that the user has not experienced, but are deemed highly relevant to her. Many of these recommendation algorithms are based on the principle of collaborative filtering, suggesting items that similar users have consumed. With the advent of online social networks, social recommendation has become one of the most popular research topics in recommender systems, exploiting the effects of social influence and selection in social networks, where user relationships are explicit, i.e., there will be an edge connecting two users if they are friends. In addition, more information about the relationships between users in social media becomes available with the rapid development of various Internet services. For example, more and more online web services are providing mechanisms by which users can self-organize into groups with other users having similar opinions or interests, enabling us to analyze the interactions between users with others insides/outsides groups, as well as the engagement between users and groups. User relationships in these applications are usually implicit and can only be utilized indirectly for recommendation tasks. In this thesis, we focus on utilizing user relationships (either explicit or implicit) to enhance personalized recommendation in social media. We study three problems of recommendation in social media, i.e., recommendation with strong and weak ties, social group recommendation and interactive social recommendation in an online setting. We propose to improve social recommendation by incorporating the concept of strong and weak ties which are two well documented terms in the social sciences, boost the performance of social group recommendation through modeling the temporal dynamics of engagement of users with groups, and tackle the interactive social recommendation problem via employing the exploitation-exploration strategy in an online setting. Our proposed models are all compared with state-of-the-art baselines on several real-world datasets.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Martin Ester
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) Ph.D.

Query processing of schema design problems for data-driven renormalization

Author: 
Date created: 
2017-10-16
Abstract: 

In the past decades, more and more information has been stored or delivered in non-relational data models—either in NoSQL databases or via a Software as a Service (SaaS) application. Users often want to load these data sets into a BI application or a relational database for further analysis. The data-driven renormalization framework is often used to transform non-relational data into relational data. In this thesis, we explore how to help users to make design decisions in such a framework. We formally define two kinds of queries—the point query and the stable interval query—to help users making design decisions. We propose two index structures, which can represent a list of FDs concisely but also process the queries efficiently. We conduct experiments on two real datasets and show that our algorithms greatly outperform the baseline method when processing a large set of FDs.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Jian Pei
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.

List homomorphism to irreflexive oriented trees

Author: 
Date created: 
2017-10-17
Abstract: 

Min ordering of a digraph $H$ plays an important role in deciding the existence of a list homomorphism to $H$. For reflexive oriented trees $T$, there exists a concrete forbidden induced subgraph characterization to have a min ordering. For irreflexive oriented trees $T$, the existence of a min-ordering turned out to be somewhat harder, as there are many types of obstructions to its existence. In this thesis, we first review the existing results for list homomorphism problems $LHOM(H)$ for digraphs and graphs. Second, for a specific subclass of irreflexive oriented trees, we present a concrete forbidden induced subgraph characterization to have a min ordering and to have an obstruction called invertible pair ($I$-$pair$) and digraph asteroidal triple ($DAT$). Moreover, for this subclass of irreflexive oriented trees $T$, we show that if $T$ contains one of the forbidden obstructions, then the problem $LHOM(T)$ is $NP$-complete, and is polynomial otherwise. Third, we discuss general trees, and present some approaches to find the minimal forbidden obstructions in the general case.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Pavol Hell
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.

An algorithmic study of kernel contraction in EL

Author: 
Date created: 
2017-09-13
Abstract: 

Kernel contraction is an interesting problem that can be considered a step towards belief revision. Kernels were introduced as a tool to determine why a given belief is accepted by the knowledge base. The aim of using kernels is to invalidate the reasons why that given belief is accepted, and hence rejecting that belief. We use Description Logic EL for two reasons: it is used in some large knowledge base applications, and it has a polynomial-time reasoning algorithm. In this study we introduce an algorithm that performs kernel contraction by reduction to the network-flow problem. We evaluate the rationality of the algorithm by applying postulates that govern kernel contraction. We also explain two heuristics: localization and specificity, that can be used to arrive at more reasonable and common-sense solutions. We will also be focusing on the complexity of the algorithms as an indicator of their feasibility.

Document type: 
Thesis
File(s): 
Senior supervisor: 
James Delgrande
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.

Joint Constrained Clustering and Feature Learning based on Deep Neural Networks

Author: 
Date created: 
2017-08-23
Abstract: 

We propose a novel method to iteratively improve the performance of constrained clustering and feature learning based on Convolutional Neural Networks (CNNs). There is no effective strategy for neither the constraint selection nor the distance metric learning in traditional constrained clustering methods. In our work, we design an effective constraint selection strategy and combine a CNN-based feature learning approach with the constrained clustering algorithm. The proposed model consists of two iterative steps: First, we replace the random constraint selection strategy with a carefully designed one; based on the clustering result and constraints obtained, we fine tune the CNN and extract new features for distance re-calculation. Our model is evaluated on a realistic video dataset, and the experimental results demonstrate that our method can improve the constrained clustering performance and feature divisibility simultaneously even with fewer constraints.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Greg Mori
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.