Computing Science - Theses, Dissertations, and other Required Graduate Degree Essays

Receive updates for this collection

Spatial skyline query problem in Euclidean and road-network spaces

Author: 
Date created: 
2020-08-28
Abstract: 

With the growth of data-intensive applications, along with the increase of both size and dimensionality of data, queries with advanced semantics have recently drawn researchers’ attention. Skyline query problem is one of them, which produces optimal results based on user preferences. In this thesis, we study the problem of spatial skyline query in the Euclidean and road network spaces. For a given data set P, we are required to compute the spatial skyline points of P with respect to an arbitrary query set Q. A point p ∈ P is a spatial skyline point if and only if, for any other data point r ∈ P , p is closer to at least one query point q ∈ Q as compared to r and has in the best case the same distance as r to the rest of the query points. We propose several efficient algorithms that outperform the existing algorithms.

Document type: 
Thesis
File(s): 
Supervisor(s): 
Binay Bhattacharya
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.

Data-driven action-value functions for evaluating players in professional team sports

Author: 
Date created: 
2020-09-11
Abstract: 

As more and larger event stream datasets for professional sports become available, there is growing interest in modeling the complex play dynamics to evaluate player performance. Among these models, a common player evaluation method is assigning values to player actions. Traditional action-values metrics, however, consider very limited game context and player information. Furthermore, they provide directly related to goals (e.g., shots), not all actions. Recent work has shown that reinforcement learning provided powerful methods for addressing quantifying the value of player actions in sports. This dissertation develops deep reinforcement learning (DRL) methods for estimating action values in sports. We make several contributions to DRL for sports. First, we develop neural network architectures that learn an action-value Q-function from sports events logs to estimate each team's expected success given the current match context. Specifically, our architecture models the game history with a recurrent network and predicts the probability that a team scores the next goal. From the learned Q-values, we derive a Goal Impact Metric (GIM) for evaluating a player's performance over a game season. We show that the resulting player rankings are consistent with standard player metrics and temporally consistent within and across seasons. Second, we address the interpretability of the learned Q-values. While neural networks provided accurate estimates, the black-box structure prohibits understanding the influence of different game features on the action values. To interpret the Q-function and understand the influence of game features on action values, we design an interpretable mimic learning framework for the DRL. The framework is based on a Linear Model U-Tree (LMUT) as a transparent mimic model, which facilitates extracting the function rules and computing the feature importance for action values. Third, we incorporate information about specific players into the action values, by introducing a deep player representation framework. In this framework, each player is assigned a latent feature vector called an embedding, with the property that statistically similar players are mapped to nearby embeddings. To compute embeddings that summarize the statistical information about players, we implement a Variational Recurrent Ladder Agent Encoder (VaRLAE) to learn a contextualized representation for when and how players are likely to act. We learn and evaluate deep Q-functions from event data for both ice hockey and soccer. These are challenging continuous-flow games where game context and medium-term consequences are crucial for properly assessing the impact of a player's actions.

Document type: 
Thesis
File(s): 
Supervisor(s): 
Oliver Schulte
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) Ph.D.

Influence diffusion: A consolidated survey, a unifying transformation, and the greedy algorithm's demonstration of force

Date created: 
2020-08-10
Abstract: 

Influence diffusion concerns the propagation of an entity throughout a network. The naming alludes to the application that motivated the study of this process: the influence that social network users have on each other's opinions. Influence diffusion has a plethora of applications in the real world, ranging from marketing campaigns, to the spread of fake news, to reinforcement learning. In this work, we provide a broad survey of the models proposed for this process in the relevant literature. We consolidate this survey of models by providing missing pieces, i.e. proofs and new models. We show intuitive connections between the introduced models, a unifying transformation between two fundamentally different models, and finally, we show the remarkable performance of the greedy algorithm in a subfield of influence diffusion that has been receiving increasing attention recently, adaptive influence diffusion.

Document type: 
Thesis
File(s): 
Supervisor(s): 
Joseph Peters
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.

Multi-person video understanding with deep neural networks

Date created: 
2020-08-25
Abstract: 

In this thesis, we present new methods to address multi-person scene understanding. Specifically, we focus on a multi-person task known as group activity recognition. We analyze multi-person scene understanding from the perspective of group activity recognition. We identify key challenges in group activity recognition, and present deep neural networks based approaches to handle these challenges. We show that our proposed approaches achieve competitive performance for group activity recognition. We also study one of the key components of group activity recognition in more detail, that is the problem of sequence modeling, where we apply new sequence modeling methods to the task of dense video captioning. In the end, we also investigate how to compress these large deep neural networks for efficient recognition on specialized domain tasks.

Document type: 
Thesis
File(s): 
Supervisor(s): 
Greg Mori
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) Ph.D.

TagFlip: Active mobile music discovery with social tags

Author: 
Date created: 
2020-06-23
Abstract: 

Recently, there has been a substantial surge in the availability of digital music through online subscription services. While these have given the user access to an almost endless variety of content, the sheer size of this choice space can also turn the user's selection process into a burden. While the purely algorithmic approach of recommending items with minimal user control has been the most popular method in commercial systems, several studies have shown that it can suffer from issues such as lack of transparency, limited user control, and pigeonholing the users. On the academic front, novel interfaces have been proposed that strive to alleviate the above issues. However, most such tools target large screens and often depend on complex user interactions. This is in discrepancy with the usual circumstances in which we listen to music, such as during commuting and work, and increasingly on our phones. We report on the design and evaluation of TagFlip. A mobile app featuring a novel interaction framework designed to fit within typical music listening scenarios, organically transforming them into more interactive music discovery experiences. Acknowledging the fleeting nature of music listening in the age of streaming and the smartphone, TagFlip was designed to require little effort from the users, while still granting them a high degree of control over the recommended music. This is done by describing each played song to the user through its most popular social tags and allowing the user to easily specify which of the tags should be considered for upcoming songs. To understand the merits of tag-based music discovery and especially our unique design, we compared TagFlip to Spotify's mobile app, both in a controlled lab experiment, and in the field. In these evaluations, TagFlip came out on top in both subjective user experience (control, transparency, and interaction) and our objective measure of number of interactions per liked song. Our field study participants found it fun and engaging to discover new tags and styles of music they had not heard of or liked before, and our logs indicated that they ended up actively seeking music within these new styles for significant portions of their time with the app. Our users found TagFlip to be an important complementary experience to that of Spotify, enabling more active and directed music discovery with minimal effort. Furthermore, we found that current mainstream approaches to music discovery may be incapable of exploiting the full potential of massive online music libraries.

Document type: 
Thesis
File(s): 
The supplementary video file briefly demonstrates various screens and functions of the TagFlip Android app
Supervisor(s): 
Hao (Richard) Zhang
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) Ph.D.

Localizing violations of approximate constraints for data error detection

Author: 
Date created: 
2020-08-14
Abstract: 

Error detection is key for data quality management. Leveraging domain knowledge in the form of user-specified constraints is one of the major approaches to error detection. A recent trend in error detection has been utilizing approximate constraints (ACs) that a relation is expected to satisfy only to a certain degree rather than completely. An example are the recently introduced statistical constraints, that allow the user to specify which correlations among attributes she expects to be present or absent in the data. Statistical constraints allow the user to express a broad range of statistical and causal domain knowledge. Extensive empirical investigations indicate that even traditional integrity constraints such as functional dependencies hold only approximately in real-world datasets. Approximate functional dependencies (AFDs) have been a data cleaning tool for some time. This thesis introduces a new technique for enhancing error detection with approximate constraints. Our starting observation is that approximate constraints are context-sensitive: the degree to which they are satisfied depends on the sub-population being considered. An error region is a subset of the data that violates an AC to a higher degree than the data as a whole, and is therefore more likely to contain erroneous records. For example, an error region may contain the set of records from before a certain year, or from a certain location. We describe an efficient algorithm for identifying distinct data regions that violate given ACs to different degrees, based on a recursive tree partitioning scheme. The learned trees describe different error regions in terms of data attributes that are easily interpreted by users (e.g. all records before 2003). This helps to explain to the user why some records were identified as likely errors. After identifying error regions, we can apply error detection methods to each error region separately, rather than to the dataset as a whole. Our empirical evaluation, done using four datasets containing both real world and synthetic errors, shows that identifying error regions increases both precision and recall of error detection based on ACs. Error regions can be combined not only with constraint-based error detection, but also with other approaches such as those based on machine learning. Our experiments provide evidence that the error regions boost the performance of machine learning methods.

Document type: 
Thesis
File(s): 
Supervisor(s): 
Oliver Schulte
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.

VisR: An interactive visualization framework for analysis of sequencing data

Date created: 
2019-10-23
Abstract: 

Several tools have been developed to enable biologists to perform initial browsing and exploration of sequencing data. However, the computational tool set for further analyses often requires significant computational expertise to use and many of the biologists with the knowledge needed to interpret these data must rely on programming experts. In this thesis, we focus on addressing this limitation through visualization tools for exploratory analysis of sequencing data and contribute the design and development of two novel systems that are flexible enough to allow a high degree of analysis power, while at the same time are easy to use for non-programmers: (1) a general purpose framework that bridges the gap between the biologists and the bioinformaticians through a system of visual analysis modules that can be rapidly developed and connected together, and (2) a first-of-its-kind system that facilitates visual parameter space analysis for a wide variety of computer models. We start by providing a characterization of the data and an abstraction of the domain tasks in the field of epigenetics and present a design study on development and evaluation of ChAsE, an interactive tool to facilitate analysis and visualization of epigenetic datasets. We will then discuss VisR, a general framework for analysis of sequencing datasets that provides both a computationally rich as well as accessible framework for integrative and interactive analyses through modules called R-apps that utilize packages in R and repositories such as Bioconductor. Our framework provides means for interactive exploration of the results or the R-apps, and supports linking apps to create more complex workflows. It also provides an ecosystem to allow extension and sharing of the apps. We finally present ModEx, a general purpose system for exploring parameters of a variety of computer models. We discuss how the system offers key components of visual parameter space analysis frameworks including parameter sampling, deriving output summaries, and an interactive and customizable exploration interface and explain how it can be used for rapid development of custom solutions for different application domains.

Document type: 
Thesis
File(s): 
Supervisor(s): 
Richard (Hao) Zhang
Torsten Möller
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) Ph.D.

Service chaining for multicast traffic

Author: 
Date created: 
2020-07-20
Abstract: 

Multicast service chaining refers to the orchestration of network services for multicast traffic. Paths of a multicast session that span the source, destinations and required services form a complex structure that we refer to as the multicast distribution graph. In this thesis, we propose a new path-based algorithm, called Oktopus, that runs at the control plane of the ISP network to calculate the multicast distribution graph for a given session. Oktopus aims at minimizing the routing cost for each multicast session. Oktopus consists of two steps. The first one generates a set of network segments for the ISP network, and the second step uses these segments to efficiently calculate the multicast distribution graph. Oktopus has a fine-grained control over the selection of links in the distribution graphs, which leads to significant improvements in the quality of the calculated graphs. Specifically, Oktopus increases the number of allocated sessions because it can reach ISP locations that have the required services, and thus includes them in the calculated graph. Moreover, Oktopus can reduce the routing cost per session as it carefully chooses links belonging to the graph. We compared Oktopus against the optimal and closest algorithms in simulations using real ISP topologies. Our results show that Oktopus has an optimality gap of 5% on average, and it computes the distribution graphs multiple orders of magnitude faster than the optimal algorithm. Moreover, Oktopus outperforms the closest algorithm in the literature in terms of the number of allocated multicast sessions by up to 37%.

Document type: 
Thesis
File(s): 
Supervisor(s): 
Mohamed Hefeeda
Khaled Diab
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.

Hierarchical action classification with network pruning

Date created: 
2020-08-13
Abstract: 

Research on human action classification has made significant progresses in the past few years. Most deep learning methods focus on improving performance by adding more network components. We propose, however, to better utilize auxiliary mechanisms, including hierarchical classification, network pruning, and skeleton-based preprocessing, to boost the model robustness and performance. We test the effectiveness of our method on five commonly used testing datasets: NTU RGB+D 60, NTU RGB+D 120, Northwestern-UCLA Multiview Action 3D, UTD Multimodal Human Action Dataset, and Kinetics 400, which is a challenging and different dataset among the others. Our experiments show that our method can achieve either comparable or better performance on all the first four datasets. In particular, our method sets up a new baseline for NTU 120, the largest dataset among the first four. We also analyze our method with extensive comparisons and ablation studies.

Document type: 
Thesis
File(s): 
Supervisor(s): 
KangKang Yin
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.

An intermediate representation for transforming and optimizing the microarchitecture of application accelerators

Author: 
Date created: 
2020-07-27
Abstract: 

In recent years, the computing landscape has seen a shift towards specialized accelerators since the scaling of computational capacity is no longer guaranteed for every technology generation. Reconfigurable architectures like FPGAs are promising for accelerator implementation. FPGAs allow implementation of arbitrary logic functions for different classes of applications with better performance per watt over CPUs and GPUs without re-spinning the circuits like fixed-function ASICs. Unfortunately, the software programmer community has stayed away from this technology, primarily because of the abstraction gap exists between software languages and hardware design. Hardware description languages (HDLs) are very low-level, and a hardware designer should think about the design in terms of low-level building blocks such as gates and registers. The alternative to HDLs is High-level synthesis (HLS) tools. HLS frameworks synthesize hardware from a high-level description of an algorithm in the form of untimed mathematical expressions and nested, pipeline and parallel loops in software languages. The primary limitation of HLS is that the functionality and microarchitecture are conflated together in a single language. As a result, making changes to the accelerator design requires code restructuring and microarchitecture optimizations tied by program correctness. In this thesis we propose two new abstractions to decouple functionality from microarchitecture. The first abstraction is a hierarchical intermediate representation for describing parameterized accelerator microarchitecture, targeting reconfigurable architects. In this abstraction, we repre- sent the accelerator as a concurrent structural graph in which components roughly correspond to microarchitecture level hardware blocks. We describe the methods we used to lower the en- tire application graph into a parameterized intermediate hardware abstraction, μIR. We describe the implementation of this intermediate abstraction and an associated pass framework, μopt. We then discuss some of the compiler optimizations that μIR enables, including timing, spatial, and higher-order. The final system is a compiler stack that can take a high-level program as input and translate it into optimized, synthesizable hardware design. The second abstraction is a sequence of instructions that convey the producer-consumer locality between dependent instructions. We show that this new abstraction adapts to different acceleration behaviors by varying the length and the types of fused instructions.

Document type: 
Thesis
File(s): 
Supervisor(s): 
Arrvindh Shriraman
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) Ph.D.