Computing Science - Theses, Dissertations, and other Required Graduate Degree Essays

Receive updates for this collection

Model-based Outlier Detection for Object-Relational Data

Author: 
Date created: 
2016-12-06
Abstract: 

Outliers are anomalous and interesting objects that are notably different from the rest of the data. The outlier detection task has sometimes been considered as removing noise from the data. However, it is usually the significantly interesting deviations that are of most interest.Different outlier detection techniques work with various data formats. The outlier detection process needs to be sensitive to the nature of the underlying data. Most of the previous work on outlier detection was designed for propositional data. This dissertation focuses on developing outlier detection methods for structured data, more specifically object-relational data. Object-relational data can be viewed as a heterogeneous network with different classes of objects and links.We develop two new approaches to unsupervised outlier detection; both approaches leverage the statistical information obtained from a statistical-relational model. The first method develops a propositionalization approach to summarize information from object-relational data in a single data table.We use Markov Logic Network (MLN) structure learning to construct the features for the single data table and to mitigate the loss of information that usually happens when features are generated by manual aggregation. By using propositionalization as a pipeline, we can apply many previous outlier detection methods that were designed for single-table data.Our second outlier detection method ranks the objects as potential outliers in an object-oriented data model. Our key idea is to compare the feature distribution of a potential outlier object with the feature distribution of the object’s class. We introduce a novel distribution divergence concept that is suitable for outlier detection. Our methods are validated on synthetic datasets and on real-world datasets about soccer matches and movies.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Oliver Schulte
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) Ph.D.

End-To-End and Direct Human-Flying Robot Interaction

Date created: 
2016-08-19
Abstract: 

As the application domain of Unmanned Aerial Vehicles (UAV) expands to the consumer market and with recent advances in robot autonomy and ubiquitous computing, a new paradigm for human-UAV interaction has started to form. In this new paradigm, humans and UAV(s) are co-located (situated) and use natural and embodied interfaces to share au- tonomy and communicate. This is in contrast to the traditional paradigm in Human-UAV interaction in which the focus is on designing control interfaces for remotely operated UAVs and sharing autonomy among Human-UAV teams. Motivated by application domains such as wilderness search and rescue and personal filming, we define the required components of end-to-end interaction between a human and a flying robot as interaction initiation (ii) approach and re-positioning to facilitate the interaction and (iii) communication of intent and commands from the human to the UAV and vice versa. In this thesis we introduce the components we designed for creating an end-to-end Human-Flying Robot Interaction sys- tem. Mainly (i) a fast monocular computer vision pipeline for localizing stationary periodic motions in the field of view of a moving camera; (ii) a cascade approach controller that combines appearance based tracking and visual servo control to approach a human using a forward-facing monocular camera; (iii) a close-range gaze and gesture based interaction system for communication of commands from a human to multiple flying UAVs using their on-board monocular camera; and (iv) a light-based feedback system for continuous commu- nication of intents from a flying robot to its interaction partner. We provide experimental results for the performance of each individual component as well as the final integrated sys- tem in real-world Human-UAV Interaction tests. Our interaction system, which integrates all these components, is the first realized end-to-end Human-Flying Robot Interaction sys- tem whereby an uninstrumented user can attract the attention of a distant (20 to 30m) autonomous outdoor flying robot. Once interaction is initiated, the robot approaches the user to close range (≈ 2m), hovers facing the user, then responds appropriately to a small vocabulary of hand gestures, while constantly communicating its states to the user through its embodied feedback system. All the software produced for this thesis is Open Source.

Document type: 
Thesis
Senior supervisor: 
Richard Vaughan
Greg Mori
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) Ph.D.

Energy profiling and performance optimization for network-related transactions in virtualized cloud

Author: 
Date created: 
2016-12-13
Abstract: 

Networking and machine virtualization play critical roles in the success of modern cloud computing. The energy consumption of physical machines has been carefully examined in the past, including the impact from network traffic. When it comes to virtual machines (VMs) in cloud data centers, it remains unexplored how the highly dynamic traffic affects the energy consumption in virtualized environments. In this thesis, we first present an empirical study on the interplay between energy consumption and network transactions in virtualized environments. Through the real-world measurement on both Xen- and KVM-based platforms, we show that these state-of-the-art designs bring significant overhead on virtualizing network devices and noticeably increase the demand of CPU resources when handling network traffic. Furthermore, the energy consumption varies significantly with traffic allocation strategies and virtual CPU affinity conditions, which was not seen in conventional physical machines. Next, we study the performance and energy efficiency issues when CPU intensive tasks and I/O intensive tasks are co-located inside a VM. A combined effect from device virtualization overhead and VM scheduling latency can cause severe interference in the presence of such hybrid workloads. To this end, we propose Hylics, a novel solution that enables an efficient data traverse path for both I/O and computation operations, and decouples the costly interference. Several important design issues are pinpointed and addressed during our implementation, including efficient intermediate data sharing, network service offloading, and QoS-aware memory usage management. Based on our real-world deployment in KVM, Hylics can improve computation and networking performance with a moderate amount of memory usage. Moreover, this design also sheds new light on optimizing the energy efficiency for virtualized systems.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Jiangchuan Liu
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.

Efficient high throughput sequencing data compression and genotyping methods for clinical environments

Date created: 
2016-12-05
Abstract: 

The rapid development of high throughput sequencing (HTS) technologies has made a considerable impact on clinical and genomics research. These technologies offer a time-efficient and cost-effective means for genotyping many pharmaceutical genes affecting the drug response (also known as ADMER genes), which makes HTS a good candidate for assisting the drug treatment and dosage decisions. However, challenges like data storage and transfer, as well as accurate genotype inference in the presence of various structural variations, are still preventing the wider integration of HTS platforms in clinical environments. For these reasons, this thesis presents fast and efficient methods for HTS data compression and accurate ADMER genotyping.First we propose a novel compression technique for reference-aligned HTS data, which utilizes the local assembly technique to assemble the donor genome and eliminate the redundant information about the donor present in the HTS data. Our results show that we can achieve significantly better compression rates over currently used methods, while providing fast compression speeds and random access capability on the compressed archives. We also present a companion benchmarking framework with the aim to evaluate the performance of different HTS compression tools in a fair and reproducible manner. In the second part, we investigate the genotyping of CYP2D6 gene. Although this gene is involved in the metabolism of 20–25% of all clinically prescribed drugs, accurate genotype inference of CYP2D6 presents a significant challenge for various genotyping platforms due to the presence of structural rearrangements within its region. Thus, we introduce the first computational tool which is able to accurately infer a CYP2D6 genotype from HTS data by formulating such problem as an instance of integer linear programming. Finally, we show how to extend the proposed algorithm to other genes which harbour similar structural rearrangements, like CYP2A6, and to other HTS sequencing platforms, like PGRNseq. We demonstrate the accuracy and effectiveness of the proposed algorithms on large set of simulated and real data samples sequenced by both Illumina and PGRNseq platforms.

Document type: 
Thesis
File(s): 
Senior supervisor: 
S. Cenk Sahinalp
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) Ph.D.

Non-uniform Knowledge in the Situation Calculus

Date created: 
2016-09-26
Abstract: 

Knowledge Representation and Reasoning is the field of AI concerned with storing information in a way which can be actioned upon by an agent. The situation calculus is a popular logical language for reasoning about action. A prior work by Scherl and Levesque demonstrates how the situation calculus can be used to model knowledge and knowledge-producing actions while solving the frame problem. This approach is limited in that it can only represent knowledge pertaining to the same situation in which it is held. Shapiro et al. have demonstrated how retrospection can be represented, but not so for prospection. We present an extension of Scherl and Levesque’s approach which allows for prospection and reasoning hypothetically about the outcomes of actions before they are taken.

Document type: 
Thesis
File(s): 
Senior supervisor: 
James Delgrande
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.

ExquiMo: An Exquisite Corpse Tool for Co-creative 3D Shape Modeling

Date created: 
2016-12-11
Abstract: 

We introduce a shape modeling tool, ExquiMo, which is guided by the idea of improving the creativity of 3D shape designs through collaboration. Inspired by the game of Exquisite Corpse, our tool allocates distinct parts of a shape to multiple players who model the assigned parts in a sequence. Our approach is motivated by the understanding that effective surprise leads to creative outcomes. Hence, to maintain the surprise factor of the output, we conceal the previously modeled parts from the most recent player. Part designs from individual players are fused together to produce an often unexpected, hence creative, end result. We demonstrate the effectiveness of collaborative modeling for both man-made and natural shapes. Our results show that, when compared to models designed by individual users, multi-user collaborative modeling via ExquiMo tends to lead to more creative designs in terms of the most common criteria used to identify creative artifacts.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Hao Zhang
Daniel Cohen-Or
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.

SimBoost: A Read-Across Approach for Drug-Target Interaction Prediction Using Gradient Boosting Machines

Author: 
Date created: 
2016-10-04
Abstract: 

Computational prediction of the interaction between drugs and targets is a standing challenge in drug discovery. High performance on binary drug-target benchmark datasets was reported for a number of methods. A possible drawback of binary data is that missing values and non-interacting drug-target pairs are not differentiated. In this paper, we present a method called SimBoost that predicts the continuous binding affinities of drugs and targets and thus incorporates the whole interaction spectrum from true negative to true positive interactions in the learning phase. Additionally, we propose a version called SimBoostQuant which computes a prediction interval in order to assess the confidence of the predicted affinity. We evaluate SimBoost and SimBoostQuant on three continuous drug-target datasets and show that our methods outperform the state-of-the-art models.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Martin Ester
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.

Continuous Conditional Random Fields for Drug Target Interaction Prediction

Author: 
Date created: 
2016-10-03
Abstract: 

Knowledge about the interaction between drugs and proteins is essential in the drug discovery process. Understanding the relationship between compounds and proteins through wetlab experiments alone is time-consuming and costly. To support the experimental work by prioritizing the most potent compounds for a target, numerous methods for the in-silico prediction of drug-target interaction have been proposed and high performance on binary datasets have been reported. A drawback of binary datasets is that missing values and non-interacting drug-target pairs are not differentiated. In this thesis, a model is developed that predicts the drug-target binding strengths as continuous values and thus incorporates the whole interaction spectrum from true negative to true positive interaction. The developed model combines two previously used approaches for the problem, which are Matrix Factorization and Conditional Random Fields. The model is evaluated on three datasets and a slight performance improvement is observed when compared to the state of the art method.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Martin Ester
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.

Leveraging Compiler Alias Analysis To Free Accelerators from Load-Store Queues

Author: 
Date created: 
2016-12-06
Abstract: 

Hardware accelerators are an energy efficient alternative to general purpose processors for specific program regions. They have relied on the compiler to extract instruction level parallelism but may waste significant energy in memory disambiguation and discovering memory level parallelism (MLP). Currently, accelerators either i) Define the problem away, and rely on massively parallel programming models [1, 48] to extract MLP. ii) Reuse the Out of Order (OoO) processor [7, 28], and rely on power hungry load-store queues (LSQs) for memory disambiguation, or iii) Serialize – some accelerators [47] focus on program regions where MLP is not important and simply serialize memory operations. We present NACHOS, a compiler assisted energy efficient approach to memory disambiguation, which completely eliminates the need for an LSQ. NACHOS classifies memory operations pairwise into those that don’t alias (i.e., independent memory operations), must alias (i.e., ordering is required between memory operations), and may alias (i.e., compiler is unsure). To enforce program order between must alias memory operations, the compiler inserts ordering edges that are enforced as def-use data dependencies. When the compiler is unsure (i.e., may alias) about a pair of memory operations, the hardware checks if they are independent. We demonstrate that compiler alias analysis with additional refinement can achieve high accuracy for hardware accelerated regions. In our workload suite comprising of SPEC2k, SPEC2k6, and PARSEC workloads; Across 15 applications NACHOS imposes no energy overhead over the function units (i.e., compiler resolves all dependencies), and in another 12 applications NACHOS consumes =17% of function unit energy (max: 53% in povray). Overall NACHOS achieves performance similar to an optimized LSQ and adds an overhead equal to 2.3X of compute energy.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Arrvindh Shriraman
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.

Algorithms for Scheduling and Routing Problems

Date created: 
2016-11-29
Abstract: 

Optimization has been a central topic in most scientific disciplines for centuries. Continuous optimization has long benefited from well-established techniques of calculus. Discrete optimization, on the other hand, has risen to prominence quite recently. Advances in combinatorial optimization and integer programming in the past few decades, together with the improvement of computer hardware have enabled computer scientists to approach the the problems in this area both theoretically and computationally. However, obtaining the exact solution for many discrete optimization problems remains is still a challenging task, mainly because most of these problems are NP-hard. Under the widespread assumption that P ≠ NP, these problems are intractable from a computational complexity standpoint. Therefore, we should settle for near-optimal solutions. In this thesis, we develop techniques to obtain solutions that are provably close to the optimal for different indivisible resource allocation problems. Indivisible resource allocation encompasses a large class of problems in discrete optimization which can appear in disguise in various theoretical or applied settings. Specifically, we consider two indivisible resource allocation problems. The first one is a variant of the vehicle routing problem known as Skill Vehicle Routing problem, in which the aim is to obtain optimal tours for a fleet of vehicles that provides service to a set of customers. Each of the vehicles possesses a particular set of skills suitable for a subset of the tasks. Each customer, based on the type of service he requires, can only be served by a subset of vehicles. We study this problem computationally and find either the optimal solution or a relatively tight bound on the optimal solution on fairly large problem instances. The second problem involves approximation algorithms for two versions of the classic scheduling problem, the restricted $R||C_{max}$ and the restricted Santa Claus problem. The objective is to design a polynomial time approximation scheme (PTAS) for ordered instances of the two problems. Finally, we consider the class of precedence hierarchies in which the neighborhoods of the processors form Laminar families. We show similar results for a generalization of this model.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Ramesh Krishnamurti
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) Ph.D.