Computing Science - Theses, Dissertations, and other Required Graduate Degree Essays

Receive updates for this collection

Probabilistic Analysis of Distributed Processes with Focus on Consensus

Date created: 
2017-09-22
Abstract: 

This thesis is devoted to the study of stochastic decentralized processes. Typical examples in the real world include the dynamics of weather and temperature, of traffic, the way we meet our friends, etc. We take the rich tool set from probability theory for the analysis of Markov Chains and employ it to study a wide range of such distributed processes: Forest Fire Model (social networks), Balls-into-Bins with Deleting Bins, and fundamental consensus dynamics and protocols such as the Voter Model, 2-Choices, and 3-Majority.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Petra Berenbrink
Claire Mathieu
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) Ph.D.

Speed versus accuracy in neural sequence tagging for natural language processing

Author: 
Date created: 
2017-09-12
Abstract: 

Sequence Tagging, including part of speech tagging, chunking and named entity recognition, is an important task in NLP. Recurrent neural network models such as Bidirectional LSTMs have produced impressive results on sequence tagging. In this work, we first present a Bidirectional LSTM neural network model for sequence tagging tasks. Then we show a simple and fast greedy sequence tagging system using a feedforward neural network. We compare the speed and accuracy between the Bidirectional LSTM model and the greedy feedforward model. In addition, we propose two new models based on Mention2Vec by Stratos (2016): Feedforward-Mention2Vec for named entity recognition and chunking, and BPE-Mention2Vec for part-of-speech tagging. Feedforward-Mention2Vec predicts tag boundaries and corresponding types separately. BPE-Mention2Vec uses the Byte Pair Encoding algorithm to segment words first and then predicts the part-of-speech tags for the subword spans. We carefully design the experiments to demonstrate the speed and accuracy trade-off in different models. The empirical results reveal that the greedy feedforward model can achieve comparable accuracy and faster speed than recurrent models for sequence tagging, and Feedforward-Mention2Vec is competitive with the fully structured BiLSTM model for named entity recognition while being more scalable in the number of named entity types.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Anoop Sarkar
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.

Computational Discovery of Splicing Events from High-Throughput Omics Data

Author: 
Date created: 
2017-07-18
Abstract: 

The splicing mechanism, the process of forming mature messenger RNA (mRNA) by only concatenating exons and removing introns, is an essential step in gene expression. It allows a single gene to have multiple RNA isoforms which potentially code different proteins. In addition, aberrant transcripts generated from non-canonical splicing events (e.g. gene fusions) are believed to be potential drivers in many tumor types and human diseases. Thus, identification and quantification of expressed RNAs from RNA-Seq data become fundamental steps in many clinical studies. For that reason, number of methods have been developed. Most popular computational methods designed for these high-throughput omics data start by analyzing the datasets based on existing gene annotations. However, these tools (i) do not detect novel RNA isoforms and low abundance transcripts; (ii) do not incorporate multi-mapping reads in their read counting strategies in quantifications; (iii) are sensitive to sequencing artifacts. In this thesis, we will address these computational problems for analyzing splicing events from high-throughput omics data. For identification and quantification of expressed RNAs from RNA-Seq data, we introduce CLIIQ, a unified framework to solve these two problems simultaneously. This framework also supports data from multiple samples to improve accuracy. To better incorporate multi-mapping reads into the framework, we design ORMAN, a combinatorial optimization formulation to resolve their mapping ambiguity by assigning single best location for each read. For aberrant transcript detections, we present a computational strategy ProTIE to integratively analyze proteomics and transcriptomic data from the same individual. This strategy provides proteome-level evidence for aberrant transcripts that can be used to eliminate false positives reported solely based on sequencing data.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Cenk Sahinalp
Martin Ester
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) Ph.D.

Feature-based Comparison of Flow Cytometry Data

Author: 
Date created: 
2017-08-08
Abstract: 

Flow cytometry (FCM) bioinformatics is a sub-field of bioinformatics, aimed at developing effective and efficient computational tools to store, organize, and analyze high-throughput/dimensional FCM data. Flow cytometers are capable of analyzing thousands of cells per second for up to 40 features. These features primarily signal the presence of different proteins on cells in the bloodstream. Hence contributing large amounts of data towards the big biological data paradigm. The data that a flow cytometer outputs from a biological sample, is called a FCS file.The International Mouse Phenotyping Consortium (IMPC) is a collaboration between 23 international institutions and funding organizations. Its aim is to decipher the function of 20,000 mouse genes. IMPC is doing so by breeding mice with a certain gene knocked out (KO), cancelling the function of that gene. In turn, FCM is used to measure the immunological changes correlated to this knockout. Many tools exist to classify FCS files. However, there is a lack of tools to conduct unsupervised clustering of FCS files. One goal of IMPC is to compare and contrast KO genes, hence IMPC becomes a prime motivation for this problem. As such, this thesis outlines a data processing pipeline used to isolate features for each FCS file. We then test the different types of features extracted on a benchmark data set from the FlowCAP-II challenge, containing data from healthy persons and patients with AML (acute myeloid leukemia). We then evaluate how well these features separate out FCS files of different origin (i.e. healthy vs AML).

Document type: 
Thesis
File(s): 
Senior supervisor: 
Cedric Chauve
Ryan Brinkman
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.

Machine Learning Driven Active Surfaces for 3D Segmentation of Tumour Lesions in PET Images

Author: 
Date created: 
2017-05-09
Abstract: 

One of the key challenges facing wider adoption of positron emission tomography (PET) as an imaging biomarker of disease is the development of reproducible quantitative image interpretation tools. Quantifying changes in tumor tissue, due to disease progression or treatment regimen, often requires accurate and reproducible delineation of lesions. Lesion segmentation is necessary for measuring tumor proliferation/shrinkage and radiotracer-uptake to quantify tumor metabolism.In this thesis, we develop an active surface model for segmenting lesions from PET images. We first implemented a non-convex level set active surface method with likelihood terms trained on manually-collected seeds points. We evaluated this approach on the following datasets: Images of phantoms collected by our collaborators at UBC; Quantitative Imaging Network (QIN) phantom images; and images of real patients from the QIN Head and Neck challenge. Secondly, to avoid user interaction, we developed an improved version of our method by training a machine learning system on anatomically and physiologically meaningful imaging cues to distinguish normal organ activity from tumorous lesion activity. Then, the inferred lesion likelihoods are used to guide a convex active surface segmentation model. The result is a lesion segmentation method that does not require user-initialization, manual seeding, or parameter-tweaking and, thus, guaranteeing reproducible results. We tested this enhanced method on data from the Cancer Imaging Archive. Our method not only produces more accurate segmentation than state-of-the-art segmentation results, but also does not need any user interaction.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Ghassan Hamarneh
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.

Event-based control as a cloud service

Date created: 
2017-04-19
Abstract: 

Event-based control has gained significant interest from the research community in recent years because it allows better resource utilization in networked control systems. In this thesis, we propose an architecture for offering event-based control as a service from the cloud, which not only improves resource utilization but also reduces the cost and setup time of large-scale industrial automation systems. Providing event-based control from the cloud, however, poses multiple research challenges. We address two of the main challenges, which are mitigating long and variable network delays and handling controller failures introduced because of moving the controller far away from the plant. We propose novel methods to solve the delay and failure problems and we show that these methods maintain the stability and performance of the control system. We implemented the proposed methods and deployed them on the Amazon cloud. Our results show that our delay mitigation technique can handle large communication delays up to several seconds with practically zero effect on the main performance metrics of the system. Moreover, the proposed fault tolerance approach can transparently handle controller failures even if the controlled system is thousands of miles away from its cloud controllers.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Mohamed Hefeeda
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.

Global Structure-from-Motion and Its Application

Author: 
Date created: 
2017-07-06
Abstract: 

Structure-from-motion (SfM) is a fundamental problem in 3D computer vision, with the aim of recovering camera poses and 3D scene structure simultaneously given a set of 2D images. SfM methods can be broadly divided into incremental and global methods according to their ways to register cameras. Incremental methods register cameras one by one, while global SfM methods solve all cameras simultaneously from all available relative motions. As a result, global SfM has better potential in both reconstruction accuracy and computation efficiency than incremental SfM. In this thesis, we address two challenges of global SfM. Our goal is to propose a robust and efficient global SfM system which is applicable to all kinds of motions and datasets. The first challenge is that translation averaging in global SfM is difficult, since the input relative motion between two cameras doesn’t encode the scale information. Therefore, many existing global SfM methods don’t work for the data whose measurement graph is not parallel rigid, e.g. all cameras on the same line. To tackle this challenge, we propose a global SfM method based on a novel linear relationship within camera triplets. Our formulation encodes the scale information by the baseline length ratios within the camera triplet, which helps deal with the collinear camera motion. We further extend the linear relationship within camera triplets to linear constraints for cameras seeing a common scene point, which can improve the global translation estimation for the data with weak image association. The second challenge is that global SfM methods are fragile on noisy data, and one incorrect pairwise relationship may distort the result greatly as global SfM considers all relative relationships together. To deal with this challenge, we propose a novel global SfM pipeline where camera registration is formulated as a well-posed similarity averaging problem solved robustly with L1 optimization. What’s more, the novel pipeline makes the filtering of noisy relative poses simple and effective, which can further improve the robustness of global SfM. We show the effectiveness of our global SfM system by applying it into the video alignment problem which aims to find per-pixel correspondences between two video sequences in both spatial and temporal dimensions. Guided by the 3D information from global SfM, the proposed video registration method can align videos taken at different times with substantially different appearances, in the presence of moving objects and moving cameras with slightly different trajectories.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Ping Tan
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) Ph.D.

Recommendation in Social Media: Utilizing Relationships among Users to Enhance Personalized Recommendation

Author: 
Date created: 
2017-06-29
Abstract: 

Recommender systems are ubiquitous in our digital life in recent years. They play a significant role in numerous Internet services and applications such as electronic commerce (Amazon and eBay), on-demand video streaming (Netflix and Hulu). A key task in recommender systems is to model user preferences and to suggest, for each user, a personalized list of items that the user has not experienced, but are deemed highly relevant to her. Many of these recommendation algorithms are based on the principle of collaborative filtering, suggesting items that similar users have consumed. With the advent of online social networks, social recommendation has become one of the most popular research topics in recommender systems, exploiting the effects of social influence and selection in social networks, where user relationships are explicit, i.e., there will be an edge connecting two users if they are friends. In addition, more information about the relationships between users in social media becomes available with the rapid development of various Internet services. For example, more and more online web services are providing mechanisms by which users can self-organize into groups with other users having similar opinions or interests, enabling us to analyze the interactions between users with others insides/outsides groups, as well as the engagement between users and groups. User relationships in these applications are usually implicit and can only be utilized indirectly for recommendation tasks. In this thesis, we focus on utilizing user relationships (either explicit or implicit) to enhance personalized recommendation in social media. We study three problems of recommendation in social media, i.e., recommendation with strong and weak ties, social group recommendation and interactive social recommendation in an online setting. We propose to improve social recommendation by incorporating the concept of strong and weak ties which are two well documented terms in the social sciences, boost the performance of social group recommendation through modeling the temporal dynamics of engagement of users with groups, and tackle the interactive social recommendation problem via employing the exploitation-exploration strategy in an online setting. Our proposed models are all compared with state-of-the-art baselines on several real-world datasets.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Martin Ester
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) Ph.D.

Query processing of schema design problems for data-driven renormalization

Author: 
Date created: 
2017-10-16
Abstract: 

In the past decades, more and more information has been stored or delivered in non-relational data models—either in NoSQL databases or via a Software as a Service (SaaS) application. Users often want to load these data sets into a BI application or a relational database for further analysis. The data-driven renormalization framework is often used to transform non-relational data into relational data. In this thesis, we explore how to help users to make design decisions in such a framework. We formally define two kinds of queries—the point query and the stable interval query—to help users making design decisions. We propose two index structures, which can represent a list of FDs concisely but also process the queries efficiently. We conduct experiments on two real datasets and show that our algorithms greatly outperform the baseline method when processing a large set of FDs.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Jian Pei
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.

List homomorphism to irreflexive oriented trees

Author: 
Date created: 
2017-10-17
Abstract: 

Min ordering of a digraph $H$ plays an important role in deciding the existence of a list homomorphism to $H$. For reflexive oriented trees $T$, there exists a concrete forbidden induced subgraph characterization to have a min ordering. For irreflexive oriented trees $T$, the existence of a min-ordering turned out to be somewhat harder, as there are many types of obstructions to its existence. In this thesis, we first review the existing results for list homomorphism problems $LHOM(H)$ for digraphs and graphs. Second, for a specific subclass of irreflexive oriented trees, we present a concrete forbidden induced subgraph characterization to have a min ordering and to have an obstruction called invertible pair ($I$-$pair$) and digraph asteroidal triple ($DAT$). Moreover, for this subclass of irreflexive oriented trees $T$, we show that if $T$ contains one of the forbidden obstructions, then the problem $LHOM(T)$ is $NP$-complete, and is polynomial otherwise. Third, we discuss general trees, and present some approaches to find the minimal forbidden obstructions in the general case.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Pavol Hell
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.