Computing Science - Theses, Dissertations, and other Required Graduate Degree Essays

Receive updates for this collection

Planning with preferences, specification and search

Date created: 
2016-08-12
Abstract: 

Planning is one of the fundamental problems of artificial intelligence. A classic planning problem consists of an initial world state, a set of goal conditions, and a set of actions. The solution to the problem is a sequence of actions, a plan, that when applied to the initial state, leads to a final goal state that satisfies all goal conditions. In many cases, it makes sense to add preferences to a planning problem. A preference can be viewed as a 'soft' goal. Ideally, all preferences will be satisfied by a plan, but this is not necessary, and not necessarily possible. There are two basic components to planning with preferences, specifying the preferences and the search for a high quality plan. For the former, we present a rich language for specifying preferences for planning problems. For the latter, we use heuristic guided search, a successful approach to classic planning, for planning with preferences. Landmark based heuristics have been successful in classic planning, so we examine how they can be used for preferences. Finally, we present an adaptation to the greedy best-first search algorithm, Cascading Search, that diversifies the search space and examine how effectively it speeds the search process and ensure discovery of quality plans.

Document type: 
Thesis
File(s): 
Senior supervisor: 
James Delgrande
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) Ph.D.

Adaptive Path Planning for Navigation and Sensing of Micro Aerial Vehicles

Date created: 
2016-08-03
Abstract: 

Micro Aerial Vehicles (MAVs) have gained much attention as data collection and sensing platforms. Professionals in many fields can access rich sensory data with fast update rates by performing automated aerial surveys using MAVs. However, these robots have limited payloads and short flight times. Therefore, it is useful to perform a task with light and low-powered sensors and as quickly as possible. In this proposal, we consider some of the fundamental tasks performed by MAVs and propose methods by which a MAV can achieve these tasks more efficiently and robustly. In the first part of this thesis, we consider the task of navigation in which a MAV, using visual Simultaneous Localization and Mapping (SLAM) to map the environment and localize itself within it, moves from its current location to a goal location. As SLAM is highly dependent on the visibility of visual features, we propose an adaptive path planning approach that avoids visually-poor regions of the environment and can generate safe trajectories for the MAV to perform the navigation task. In the second part, an aerial coverage task is considered where the MAV must map interesting regions with initially unknown locations. Rather than using an exhaustive ‘lawnmower’ coverage pattern that sweeps the entire region uniformly, we propose non-uniform coverage strategies that adaptively cover all the interesting regions with high resolution and coarsely sweep the rest of the area. The proposed methods generate shorter trajectories compared to a uniform lawnmower pattern. In the last part of the thesis, we assume a time/energy budget for the vehicle and consider specific costs for different manoeuvres of the MAV. We introduce a novel problem of guaranteeing complete coverage of an area at low resolution, while identifying regions of interest (ROIs), and locally surveying as much of the ROIs at high resolution as the battery allows. Three different policies are proposed to decide when to switch between the coarse survey and high resolution imagery data collection.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Richard Vaughan
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Dissertation) Ph.D.

Decipherment of Evasive or Encrypted Offensive Text

Author: 
Date created: 
2016-07-20
Abstract: 

Automated filters are commonly used in on line chat to stop users from sending malicious messages such as age-inappropriate language, bullying, and asking users to expose personal information. Rule based filtering systems are the most common way to deal with this problem but people invent increasingly subtle ways to disguise their malicious messages to bypass such filtering systems. Machine learning classifiers can also be used to identify and filter malicious messages, but such classifiers rely on training data that rapidly becomes out of date and new forms of malicious text cannot be classified accurately. In this thesis, we model the disguised messages as a cipher and apply automatic decipherment techniques to decrypt corrupted malicious text back into plain text which can be then filtered using rules or a classifier. We provide experimental results on three different data sets and show that decipherment is an effective tool for this task.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Anoop Sarkar
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.

Incorporating Colour Information for Computer-Aided Diagnosis of Melanoma from Dermoscopy Images

Author: 
Date created: 
2016-05-16
Abstract: 

Cutaneous Melanoma is the most life-threatening form of skin cancer. Although advanced melanoma is often considered as incurable, if detected and excised early, the prognosis is promising. Today, clinician use Computer Vision in an increasing number of applications to aid early detection of melanoma through dermatological image analysis (dermoscopy images, in particular). Colour assessment is essential for the clinical diagnosis of skin cancers. Due to this diagnostic importance, many studies have either focused on or employed colour features as a constituent part of their skin lesion analysis systems. These studies range from using low-level colour features, such as simple statistical measures of colours occurring in the lesion, to availing themselves of high-level semantic features such as the presence of blue-white veil, globules or colour variegation in the lesion. This thesis provides a detailed exposition of my recent contributions in this research direction. In particular, it describes two novel approaches for utilizing colour both as low-level and high- level image feature. The first contribution describes a technique that employs the stochastic Latent Topic Models framework to allow quantification of melanin and hemoglobin content in dermoscopy images. Such information bears useful implications for the analysis of skin hyper-pigmentation, and for classification of skin diseases. The second contribution is a novel approach to identify one of the most significant dermoscopic criteria in the diagnosis of Cutaneous Melanoma: the Blue-whitish structure. This is achieved in a Multiple Instance Learning framework with only image-level labels of whether the feature is present or not. As the output, we predict the image label and also localize the feature in the image. Experiments are conducted on a challenging dataset with results outperforming state-of-the-art. Moreover, the thesis explores using physic-based photometric models to enhance dermoscopy image analysis. In particular, it proposes methods for colour-to-greyscale conversion, shading removal, and glare attenuation. The studies reported in this thesis provide an improvement on the scope of modelling for computerized image analysis of skin lesions.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Mark S. Drew
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) Ph.D.

Left-to-Right Hierarchical Phrase-based Machine Translation

Author: 
Date created: 
2016-07-13
Abstract: 

Hierarchical phrase-based translation (Hiero for short) models statistical machine translation (SMT) using a lexicalized synchronous context-free grammar (SCFG) extracted from word aligned bitexts. The standard decoding algorithm for Hiero uses a CKY-style dynamic programming algorithm with time complexity O(n^3) for source input with n words. Scoring target language strings using a language model in CKY-style decoding requires two histories per hypothesis making it significantly slower than phrase-based translation which only keeps one history per hypothesis. In addition, the size of a Hiero SCFG grammar is typically much larger than phrase-based models when extracted from the same data which also slows down decoding. In this thesis we address these issues in Hiero by adopting a new translation model and decoding algorithm called Left-to-Right hierarchical phrase-based translation (LR-Hiero for short). LR-Hiero uses a constrained form of lexicalized SCFG rules to encode translation, where the target-side is constrained to be prefix-lexicalized. LR-Hiero uses a decoding algorithm with time complexity O(n^2) that generates the target language output in left-to-right manner which keeps only one history per hypothesis resulting in faster decoding for Hiero grammars. The thesis contains the following contributions: (i) We propose a novel dynamic programming algorithm for rule extraction phase. Unlike traditional Hiero rule extraction which performs a brute-force search, LR-Hiero rule extraction is linear in the number of rules. (ii) We propose an augmented version of LR-decoding algorithm previously proposed by (Watanabe+, ACL 2006). Our modified LR-decoding algorithm addresses issues related to decoding time and translation quality and is shown to be more efficient than the CKY decoding algorithm in our experimental results. (iii) We extend our LR-decoding algorithm to capture all hierarchical phrasal alignments that are reachable in CKY-style decoding algorithms. (iv) We introduce a lexicalized reordering model to LR-Hiero that significantly improves the translation quality. (v) We apply LR-Hiero to the task of simultaneous translation; the first attempt to use Hiero models in simultaneous translation. We show that we can perform online segmentation on the source side to improve latency and maintain translation quality.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Anoop Sarkar
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Dissertation) Ph.D.

An investigation of iterated multi-agent belief change

Author: 
Date created: 
2016-05-19
Abstract: 

Multi-agent belief change is an area concerned with the belief dynamics of a network of communicating agents. A network is represented by a graph, where vertices represent agents that share information via a process of minimizing disagreements between themselves. Previous work by Delgrande, Lang, and Schaub addressed belief change through global minimization, with a weak notion of distance between agents. We extend it by applying iterative procedures that take distance into account. We have identified two approaches to iteration: in the first, a vertex incorporates information from its immediate neighbours only; in the second, a vertex incorporates information from progressively more distant neighbours. Our research has both theoretical and practical contributions: first, we define the iterative approaches, find relationships between them, and investigate their logical properties; then, we introduce a software system called Equibel that implements both the global and iterative approaches, using Answer Set Programming and Python.

Document type: 
Thesis
File(s): 
Senior supervisor: 
James Delgrande
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.

Efficiently compressing string columnar data using frequent pattern mining

Author: 
Date created: 
2016-06-20
Abstract: 

In modern column-oriented databases, compression is important for improving I/O throughput and overall database performance. Many string columnar data cannot be compressed by special-purpose algorithms such as run-length encoding or dictionary compression, and the typical choice for them is the LZ77-based compression algorithms such as GZIP or Snappy. These algorithms treat data as a byte block and do not exploit the columnar nature of the data. In this thesis, we develop a compression algorithm using frequent string patterns directly mined from a sample of a string column. The patterns are used as the dictionary phrases for compression. We discuss some interesting properties of frequent patterns in the context of compression, and develop a pruning method to address the cache inefficiencies in indexing the patterns. Experiments show that our compression algorithm outperforms Snappy in compression ratio while retains compression and decompression speed.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Jian Pei
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.

Bilingual Language Models using Word Embeddings for Machine Translation

Date created: 
2016-05-19
Abstract: 

Bilingual language models (Bi-LMs) refer to language models that are modeled using both source and target words in a parallel corpus. While translating a source sentence to a target language, the decoder in phrase-based machine translation system breaks down the source sentence into phrases. It then translates each phrase into the target language. While decoding each phrase, the decoder has very little information about source words that are outside the current phrase in consideration. Bi-LMs have been used to provide more information about source words outside the current phrase. Bi-LMs are estimated by first creating bitoken sequences using a parallel corpus and the word alignments between the source and target words in that corpus. When creating the bitoken sequences, the vocabulary expands considerably and Bi-LMs suffer due to this huge vocabulary which in turn increases the sparsity of the language models. In previous work, bitokens were created by first replacing each word in the parallel corpus either by their part-of-speech tags or word classes after clustering using the Brown clustering algorithm. Both of these approaches only take into account words that are direct translations of each other as they only depend on word alignments between the source word and target word in the bitokens. In this thesis, we propose the use of bilingual word embeddings as a first step to reduce the vocabulary of the bitokens. Bilingual word embeddings are a low dimensional representation of words trained on a parallel corpus of aligned sentences in two languages. Using bilingual word embeddings to build Bi-LMs for machine translation is significantly better than the previous state of the art that uses Bi-LMs with an increase of 1.4 BLEU points in our experiments.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Anoop Sarkar
Fred Popowich
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.

Random Walks in the Edge Sampling Model

Author: 
Date created: 
2016-06-30
Abstract: 

The random walk is an important tool to analyze the structural features of graphs such as the community structure and the PageRank. The problem of generating a random walk may be hard if we are not given full access to the graph. The main component of this thesis is solving the problem in one such model with restricted access to the graph, the edge sampling model. We design Sampling-AS, a randomized algorithm that efficiently samples the endpoint of a random walk, unless some unlikely event happens during the execution of the algorithm. We also propose Sampling-LS, a randomized algorithm that always samples the endpoint of a random walk; however, its performance is not as good. Moreover, we slightly modify both algorithms to improve their performance on some special classes of graphs such as regular graphs, random graphs and fast mixing graphs. Finally, we consider some applications for both algorithms.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Andrei Bulatov
Petra Berenbrink
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.

An Algorithmic Study on Ridesharing Problem

Date created: 
2016-07-28
Abstract: 

The ridesharing problem is to share personal vehicles by individuals (participants) with similar itineraries. A trip in the ridesharing problem is a participant and his/her itinerary. To realize a trip is to deliver the participant to his/her destination by a vehicle satisfying the itinerary requirement. In this thesis, we focus on two optimization goals: minimize the number of vehicles and minimize the total travel distance of vehicles to realize all trips. The minimization problems are complex and NP-hard because of many parameters. We simplify the problems by considering only some of the parameters. We prove that some simplified minimization problems are NP-hard while a further simplified variant is polynomial time solvable. These suggest a boundary between the NP-hard and polynomial time solvable cases. We also propose a novel approach for finding a maximum matching in hypergraphs with special properties, extending the well-known maximum matching theorem in graphs to hypergraphs.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Qianping Gu
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.