Computing Science - Theses, Dissertations, and other Required Graduate Degree Essays

Receive updates for this collection

When learning meets RFIDs: The case of activity identification

Author: 
Date created: 
2018-04-04
Abstract: 

Over the past decades have seen booming interests in human activity identification that is widely used in a range of Internet-of-Things applications, such as healthcare and smart homes. It has attracted significant attention from both academia and industry, with a wide range of solutions based on cameras, radars, and/or various inertial sensors. They generally require the object of identification to carry sensors/wireless transceivers, which are not negligible in both size and weight, not to mention the constraints from the battery. Radio frequency identification (RFID) is a promising technology that can overcome those difficulties due to its low cost, small form size, and batterylessness, making it widely used in a range of mobile applications. The information offered by today's RFID tags however are quite limited, and the typical raw data (RSSI and phase angles) are not necessarily good indicators of human activities (being either insensitive or unreliable as revealed by our realworld experiments). As such, existing RFID-based activity identification solutions are far from being satisfactory. It is also well known that the accuracy of the readings can be noticeably affected by multipath, which unfortunately is inevitable in an indoor environment and is complicated with multiple reference tags. In this thesis, we first reviewed the literature and research challenges of multipath effects in activity identification with RFIDs. Then we introduced three advanced RFID learning-based activity identification frameworks, i.e., i2tag, TagFree and M2AI, for tag mobility profiling, RFID-based device-free activity identification and tag-attached multi-object activity identification, respectively. Our extensive experiments further demonstrate their superiority on activity identification in the multipath-rich environments.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Jiangchuan Liu
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) Ph.D.

Improving software quality for regular expression matching tools using automated combinatorial testing

Date created: 
2017-12-19
Abstract: 

Regular expression matching tools (grep) match regular expressions to lines of text. However, because of the complexity that regular expressions can reach, it is challenging to apply state of the art automated testing frameworks to grep tools. Combinatorial testing has shown to be an effective testing methodology, especially for systems with large input spaces. In this dissertation, we investigate the approach of a fully automated combinatorial testing system for regular expression matching tools CoRE (Combinatorial testing for Regular Expressions). CoRE automatically generates test cases using combinatorial testing and measures correctness using differential testing. CoRE outperformed AFL and AFLFast in terms of code coverage testing icGrep, GNU grep and PCRE grep.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Robert Cameron
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.

Multidimensional Parallelization for Streaming Text Processing Applications Based on Parabix Framework

Author: 
Date created: 
2017-12-15
Abstract: 

Streaming text processing is important for transforming and analyzing the rapidly growing data in modern society. Unfortunately, text processing software written using the sequential byte-at-a-time processing model fails to take full advantage of the resources available on modern processors for many reasons, including significant branch misprediction penalties due to the input-dependent branching structures of text processing applications, cache miss penalties for table-based operations applied per byte of input, and logical complexity that makes it difficult to process more than a single-byte at a time. Common solutions to process text streams that have dependencies from start to end may involve state machine or recursive algorithms, which are generally considered hard to parallelize and hence ill-suited for multicore or manycore processors. However, the Parabix approach to text processing has recently been shown to offer a promising alternative, based on the concept of bitwise data parallelism: a transform representation of text that uses the full width of available processor registers at a density of one bit per input byte. This dissertation investigates the further development of the Parabix framework to incorporate multidimensional parallelization, combining Parabix methods with several different models of multithreading such as task parallelism, data parallelism and pipeline parallelism as well as with GPU-based SIMT processing. A form of data-pipeline parallelism is developed and shown to be beneficial for text-processing applications even with strong sequential state dependencies. Compilers for both pure pipeline parallelism and data-pipeline parallelism are developed and integrated into Parabix framework to provide automated multithreading support for any Parabix applications. Methods for task parallelism and data parallelism are also developed, but need to be customized by the programmer for specific applications. GPU support is added to Parabix framework by translating LLVM IR into PTX, which can be compiled into binary code and run on GPU devices. Programmers can simply choose to use NVPTX driver instead of CPU driver for code that is executed on GPU. Several applications based on Parabix are implemented and tested with different parallelization techniques to analyze the advantages and limits of multidimensional parallelization extensions of Parabix framework. In data-pipeline mode, we are able to achieve 215% speedup compared with the sequential version on a quad-core machine. The GPU implementation has its limitations but can give up to 310% speed-up.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Arrvindh Shriraman
Robert Cameron
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) Ph.D.

Subspace Clustering Methods for Understandable Information Organization

Author: 
Date created: 
2017-12-08
Abstract: 

Clustering has been widely used to identify possible structures in data and help users understand data in an unsupervised manner. Although clustering methods can provide group information for users, it is still challenging for users to efficiently and effectively understand clustering structures. Subspace clustering methods address this challenge by providing each clustering an understandable feature subspace. Moreover, as high-dimensional data has become more and more popular in real-world applications due to the advances of big data technologies, some subspace clustering methods are especially scalable for high-dimensional data. In this thesis, we study the subspace clustering problem in different application scenarios (e.g., semi-supervised and unsupervised) to bridge the semantic gap between low-level clustering structures and high-level understandable meanings, that is, providing understandable information organization for end users. Specifically, we develop a series of novel subspace clustering methods for different application purposes. Taking image retrieval in CBIR without query input as an example, we study how subspace clustering methods can help retrieve relevant images by relevance feedback, by efficient iterative search, or by data summarization. To tackle the challenges arising from real-world applications, we develop three efficient and effective subspace clustering methods to provide preferred or understandable clustering structures for end users. First, we present a semi-supervised subspace clustering method to discover a feature subspace, in which a preferred clustering structure is hidden. Second, we propose a subspace hierarchical clustering method that can generate a balanced hierarchy with semantics to help users search friendly and efficiently. Third, we develop a subspace multi-clustering method that can automatically discover a certain number of feature subspaces, where different clustering structures reside. Comprehensive empirical studies using synthetic and real data sets demonstrate the effectiveness of our proposed methods.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Jian Pei
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) Ph.D.

REP3D: 3D Human Motion Capture Dataset for Athletic Movement

Author: 
Date created: 
2017-12-12
Abstract: 

The field of human 3D pose estimation suffers from a small population of diverse public motion capture datasets, each with a low number of environments and subjects. We propose a new dataset including 45 participants and 22 environments, using motion capture technology that allows data collection in arbitrary locations. The dataset is composed of video and motion capture data for athletic actions selected from golf and baseball, recorded from a plurality of angles and distances. The annotation process for semi-automatically aligning video data with ground truth 3D joint locations is fully outlined. The performance of a modern human 3D pose estimation model on a subset of the dataset is reported.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Greg Mori
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.

Computational study on a branch decomposition based exact distance oracle for planar graphs

Author: 
Date created: 
2017-10-19
Abstract: 

We present a simple exact distance oracle for the point-to-point shortest distance problem in planar graphs. Given an edge weighted planar graph G of n vertices, we decompose G into subgraphs by a branch-decomposition of G, compute the shortest distances between each vertex in a subgraph and the vertices in the boundary of the subgraph, and keep the shortest distances in the oracle. Let bw(G) be the branchwidth of G. Our oracle has O(bw(G)) query time, O(bw(G)n log(n)) size and O(n^2 log(n)) pre-processing time. Computational studies show that our oracle is much faster than Dijkstra’s algorithm for answering point-to-point shortest distance queries for several classes of planar graphs.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Qianping Gu
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.

Computational exploration of transmission and acquisition of drug-resistant tuberculosis

Author: 
Date created: 
2017-09-07
Abstract: 

Causing the deaths of millions of people annually, tuberculosis is now a universally shared threat. The emergence and spread of drug-resistant tuberculosis strains make effective treatment and control of tuberculosis complicated. Drug-resistance can arise from two sources: acquisition and transmission. Whole genome sequencing now plays an important role in bioinformatics and the research on infectious diseases. Based on the whole genome sequencing data of 110 tuberculosis isolates, a phylogenetic tree was built and the ancestral sequence was reconstructed. A simulation model was then designed to explore the development of drug-resistant tuberculosis in a population and resulted in about 300,000 tuberculosis patients. The tuberculosis patient clustering algorithm, which achieved desirable performance, was then proposed to cluster these tuberculosis patients into epidemiologically related groups. This research explored the transmission and acquisition of drug-resistant tuberculosis, and used modeling and algorithms in support of treatment and control of tuberculosis.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Leonid Chindelevitch
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.

Model for police dispatching and shift scheduling

Date created: 
2017-12-08
Abstract: 

Emergency services have been in existence since the beginning of recorded history, yet most efficient and effective use of resources in this field is still considered an open problem. In this thesis, we explore the challenges involved in police services, and present an approach for determining police force capacity for a given call for service data. We use mathematical programming for modeling police dispatching and shift scheduling, and test our method on real occurrence data.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Binay Bhattacharya
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.

A high-throughput dependency parser

Date created: 
2017-10-17
Abstract: 

Dependency parsing is an important task in NLP, and it is used in many downstream tasks for analyzing the semantic structure of sentences. Analyzing very large corpora in a reasonable amount of time, however, requires a fast parser. In this thesis we develop a transition-based dependency parser with a neural-network decision function which outperforms spaCy, Stanford CoreNLP, and MALTParser in terms of speed while having a comparable, and in some cases better, accuracy. We also develop several variations of our model to investigate the trade-off between accuracy and speed. This leads to a model with a greatly reduced feature set which is much faster but less accurate, as well as a more complex model involving a BiLSTM simultaneously trained to produce POS tags which is more accurate, but much slower. We compare the accuracy and speed of our different parser models against the three mentioned parsers on the Penn Treebank, Universal Dependencies English, and Ontonotes datasets using two different dependency tree representations to show how our parser competes on data from very different domains. Our experimental results reveal that our main model is much faster than the 3 external parsers while also being more accurate in some cases; our reduced feature set model is significantly faster while remaining competitive in terms of accuracy; and our BiLSTM-using model is somewhat faster than CoreNLP and is significantly more accurate.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Anoop Sarkar
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.

Advanced techniques for bounded and unbounded repetition in parabix regular expression search

Author: 
Date created: 
2017-09-12
Abstract: 

The three-level architecture of the regular expression search tool named icGrep, which is based on a pure parallel Parabix framework, has shown great speedup compared to conventional search tools. This thesis proposed some advanced techniques for bounded and unbounded repetition in Parabix regular expression search. We first accelerated the bounded repetition type of Unicode unit-length regular expressions by utilizing a log2 technique with the UTF8-to-UTF32 pipeline. To reduce the overhead brought about by the UTF-8-to-UTF-32 transformation, the multiplexed character classes concept was proposed. For the unbounded repetition part, we have reviewed finite automata theory for application to Parabix regular expression matching and proposed a totally different compile pipeline for the local language. In the meanwhile, we proposed star-normal-form optimization to make the RE abstract syntax tree less complex and less ambiguous. All of these innovative techniques have demonstrated their performance dealing with repetition against the basic pipeline.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Robert D. Cameron
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.