Computing Science - Theses, Dissertations, and other Required Graduate Degree Essays

Receive updates for this collection

Mobile Based OLAP Using Parallel Processing and Multithreading

Author: 
Date created: 
2016-08-03
Abstract: 

This work is committed to discovering the influence of parallel mechanism on I/O heavy applications such as OLAP operations on remote data set running on a resource constrained client. In this research, we have chosen an Android based smartphone as our client. We have implemented a mobile phone based client-centric OLAP application utilizing the potentiality of parallel processing mechanism of Android. We are dividing user request and associated caching tasks into smaller units and distributing them to multiple threads to execute concurrently. While serving user's requested query data, we are downloading and caching (hash based pre-emptive caching) extra sub-cube data in the background by employing multiple threads running in parallel. This yields better performance for succeeding cross-tab or drill-down actions. We have captured test results by varying number of threads and amount of data size associated with each thread. Result shows that parallel approach has striking performance improvement over sequential approach.

Document type: 
Graduating extended essay / Research project
File(s): 
Senior supervisor: 
Wo-shun Luk
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Project) M.Sc.

Towards Better User Preference Learning for Recommender Systems

Author: 
Date created: 
2016-08-11
Abstract: 

In recent years, recommender systems have become widely utilized by businesses across industries. Given a set of users, items, and observed user-item interactions, these systems learn user preferences by collective intelligence, and deliver proper items under various contexts to improve user engagements and merchant profits. Collaborative Filtering is the most popular method for recommender systems. The principal idea of Collaborative Filtering is that users might be interested in the items that are preferred by users with similar preferences. Therefore, learning user preferences is the core technique of Collaborative Filtering. In this thesis, we study new methods to help us better understand user preferences from three perspectives. We first dive into each rating that users give on the items, and study the reasons behind the ratings by analyzing user reviews. We propose a unified model that combines the advantages of aspect-based opinion mining and collaborative filtering, which could extract latent aspects and sentiments from reviews and learn users' preferences of different aspects of items collectively. In our next work, we study the problem from each user's perspective, and propose a general and flexible model that embraces several popular models as special cases. The new model achieves better top-N recommendation results on several popular data sets. Finally, we study how to utilize the general structure of the user-item matrix to better apply collaborative filtering methods. We propose a co-clustering based method that first partitions the users and items into several overlapping and focused subgroups, and then applies collaborative filtering methods within each subgroup. The final recommendations for users are aggregated from the results from the subgroups they are involved in. Experimental results show that this method could produce better recommendations than other co-clustering methods and methods that directly apply collaborative filtering on the original user-item matrix.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Martin Ester
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Dissertation) Ph.D.

Planning with preferences, specification and search

Date created: 
2016-08-12
Abstract: 

Planning is one of the fundamental problems of artificial intelligence. A classic planning problem consists of an initial world state, a set of goal conditions, and a set of actions. The solution to the problem is a sequence of actions, a plan, that when applied to the initial state, leads to a final goal state that satisfies all goal conditions. In many cases, it makes sense to add preferences to a planning problem. A preference can be viewed as a 'soft' goal. Ideally, all preferences will be satisfied by a plan, but this is not necessary, and not necessarily possible. There are two basic components to planning with preferences, specifying the preferences and the search for a high quality plan. For the former, we present a rich language for specifying preferences for planning problems. For the latter, we use heuristic guided search, a successful approach to classic planning, for planning with preferences. Landmark based heuristics have been successful in classic planning, so we examine how they can be used for preferences. Finally, we present an adaptation to the greedy best-first search algorithm, Cascading Search, that diversifies the search space and examine how effectively it speeds the search process and ensure discovery of quality plans.

Document type: 
Thesis
File(s): 
Senior supervisor: 
James Delgrande
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) Ph.D.

Adaptive Path Planning for Navigation and Sensing of Micro Aerial Vehicles

Date created: 
2016-08-03
Abstract: 

Micro Aerial Vehicles (MAVs) have gained much attention as data collection and sensing platforms. Professionals in many fields can access rich sensory data with fast update rates by performing automated aerial surveys using MAVs. However, these robots have limited payloads and short flight times. Therefore, it is useful to perform a task with light and low-powered sensors and as quickly as possible. In this proposal, we consider some of the fundamental tasks performed by MAVs and propose methods by which a MAV can achieve these tasks more efficiently and robustly. In the first part of this thesis, we consider the task of navigation in which a MAV, using visual Simultaneous Localization and Mapping (SLAM) to map the environment and localize itself within it, moves from its current location to a goal location. As SLAM is highly dependent on the visibility of visual features, we propose an adaptive path planning approach that avoids visually-poor regions of the environment and can generate safe trajectories for the MAV to perform the navigation task. In the second part, an aerial coverage task is considered where the MAV must map interesting regions with initially unknown locations. Rather than using an exhaustive ‘lawnmower’ coverage pattern that sweeps the entire region uniformly, we propose non-uniform coverage strategies that adaptively cover all the interesting regions with high resolution and coarsely sweep the rest of the area. The proposed methods generate shorter trajectories compared to a uniform lawnmower pattern. In the last part of the thesis, we assume a time/energy budget for the vehicle and consider specific costs for different manoeuvres of the MAV. We introduce a novel problem of guaranteeing complete coverage of an area at low resolution, while identifying regions of interest (ROIs), and locally surveying as much of the ROIs at high resolution as the battery allows. Three different policies are proposed to decide when to switch between the coarse survey and high resolution imagery data collection.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Richard Vaughan
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Dissertation) Ph.D.

Decipherment of Evasive or Encrypted Offensive Text

Author: 
Date created: 
2016-07-20
Abstract: 

Automated filters are commonly used in on line chat to stop users from sending malicious messages such as age-inappropriate language, bullying, and asking users to expose personal information. Rule based filtering systems are the most common way to deal with this problem but people invent increasingly subtle ways to disguise their malicious messages to bypass such filtering systems. Machine learning classifiers can also be used to identify and filter malicious messages, but such classifiers rely on training data that rapidly becomes out of date and new forms of malicious text cannot be classified accurately. In this thesis, we model the disguised messages as a cipher and apply automatic decipherment techniques to decrypt corrupted malicious text back into plain text which can be then filtered using rules or a classifier. We provide experimental results on three different data sets and show that decipherment is an effective tool for this task.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Anoop Sarkar
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.

Incorporating Colour Information for Computer-Aided Diagnosis of Melanoma from Dermoscopy Images

Author: 
Date created: 
2016-05-16
Abstract: 

Cutaneous Melanoma is the most life-threatening form of skin cancer. Although advanced melanoma is often considered as incurable, if detected and excised early, the prognosis is promising. Today, clinician use Computer Vision in an increasing number of applications to aid early detection of melanoma through dermatological image analysis (dermoscopy images, in particular). Colour assessment is essential for the clinical diagnosis of skin cancers. Due to this diagnostic importance, many studies have either focused on or employed colour features as a constituent part of their skin lesion analysis systems. These studies range from using low-level colour features, such as simple statistical measures of colours occurring in the lesion, to availing themselves of high-level semantic features such as the presence of blue-white veil, globules or colour variegation in the lesion. This thesis provides a detailed exposition of my recent contributions in this research direction. In particular, it describes two novel approaches for utilizing colour both as low-level and high- level image feature. The first contribution describes a technique that employs the stochastic Latent Topic Models framework to allow quantification of melanin and hemoglobin content in dermoscopy images. Such information bears useful implications for the analysis of skin hyper-pigmentation, and for classification of skin diseases. The second contribution is a novel approach to identify one of the most significant dermoscopic criteria in the diagnosis of Cutaneous Melanoma: the Blue-whitish structure. This is achieved in a Multiple Instance Learning framework with only image-level labels of whether the feature is present or not. As the output, we predict the image label and also localize the feature in the image. Experiments are conducted on a challenging dataset with results outperforming state-of-the-art. Moreover, the thesis explores using physic-based photometric models to enhance dermoscopy image analysis. In particular, it proposes methods for colour-to-greyscale conversion, shading removal, and glare attenuation. The studies reported in this thesis provide an improvement on the scope of modelling for computerized image analysis of skin lesions.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Mark S. Drew
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) Ph.D.

Left-to-Right Hierarchical Phrase-based Machine Translation

Author: 
Date created: 
2016-07-13
Abstract: 

Hierarchical phrase-based translation (Hiero for short) models statistical machine translation (SMT) using a lexicalized synchronous context-free grammar (SCFG) extracted from word aligned bitexts. The standard decoding algorithm for Hiero uses a CKY-style dynamic programming algorithm with time complexity O(n^3) for source input with n words. Scoring target language strings using a language model in CKY-style decoding requires two histories per hypothesis making it significantly slower than phrase-based translation which only keeps one history per hypothesis. In addition, the size of a Hiero SCFG grammar is typically much larger than phrase-based models when extracted from the same data which also slows down decoding. In this thesis we address these issues in Hiero by adopting a new translation model and decoding algorithm called Left-to-Right hierarchical phrase-based translation (LR-Hiero for short). LR-Hiero uses a constrained form of lexicalized SCFG rules to encode translation, where the target-side is constrained to be prefix-lexicalized. LR-Hiero uses a decoding algorithm with time complexity O(n^2) that generates the target language output in left-to-right manner which keeps only one history per hypothesis resulting in faster decoding for Hiero grammars. The thesis contains the following contributions: (i) We propose a novel dynamic programming algorithm for rule extraction phase. Unlike traditional Hiero rule extraction which performs a brute-force search, LR-Hiero rule extraction is linear in the number of rules. (ii) We propose an augmented version of LR-decoding algorithm previously proposed by (Watanabe+, ACL 2006). Our modified LR-decoding algorithm addresses issues related to decoding time and translation quality and is shown to be more efficient than the CKY decoding algorithm in our experimental results. (iii) We extend our LR-decoding algorithm to capture all hierarchical phrasal alignments that are reachable in CKY-style decoding algorithms. (iv) We introduce a lexicalized reordering model to LR-Hiero that significantly improves the translation quality. (v) We apply LR-Hiero to the task of simultaneous translation; the first attempt to use Hiero models in simultaneous translation. We show that we can perform online segmentation on the source side to improve latency and maintain translation quality.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Anoop Sarkar
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Dissertation) Ph.D.

An investigation of iterated multi-agent belief change

Author: 
Date created: 
2016-05-19
Abstract: 

Multi-agent belief change is an area concerned with the belief dynamics of a network of communicating agents. A network is represented by a graph, where vertices represent agents that share information via a process of minimizing disagreements between themselves. Previous work by Delgrande, Lang, and Schaub addressed belief change through global minimization, with a weak notion of distance between agents. We extend it by applying iterative procedures that take distance into account. We have identified two approaches to iteration: in the first, a vertex incorporates information from its immediate neighbours only; in the second, a vertex incorporates information from progressively more distant neighbours. Our research has both theoretical and practical contributions: first, we define the iterative approaches, find relationships between them, and investigate their logical properties; then, we introduce a software system called Equibel that implements both the global and iterative approaches, using Answer Set Programming and Python.

Document type: 
Thesis
File(s): 
Senior supervisor: 
James Delgrande
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.

Efficiently compressing string columnar data using frequent pattern mining

Author: 
Date created: 
2016-06-20
Abstract: 

In modern column-oriented databases, compression is important for improving I/O throughput and overall database performance. Many string columnar data cannot be compressed by special-purpose algorithms such as run-length encoding or dictionary compression, and the typical choice for them is the LZ77-based compression algorithms such as GZIP or Snappy. These algorithms treat data as a byte block and do not exploit the columnar nature of the data. In this thesis, we develop a compression algorithm using frequent string patterns directly mined from a sample of a string column. The patterns are used as the dictionary phrases for compression. We discuss some interesting properties of frequent patterns in the context of compression, and develop a pruning method to address the cache inefficiencies in indexing the patterns. Experiments show that our compression algorithm outperforms Snappy in compression ratio while retains compression and decompression speed.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Jian Pei
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.

Bilingual Language Models using Word Embeddings for Machine Translation

Date created: 
2016-05-19
Abstract: 

Bilingual language models (Bi-LMs) refer to language models that are modeled using both source and target words in a parallel corpus. While translating a source sentence to a target language, the decoder in phrase-based machine translation system breaks down the source sentence into phrases. It then translates each phrase into the target language. While decoding each phrase, the decoder has very little information about source words that are outside the current phrase in consideration. Bi-LMs have been used to provide more information about source words outside the current phrase. Bi-LMs are estimated by first creating bitoken sequences using a parallel corpus and the word alignments between the source and target words in that corpus. When creating the bitoken sequences, the vocabulary expands considerably and Bi-LMs suffer due to this huge vocabulary which in turn increases the sparsity of the language models. In previous work, bitokens were created by first replacing each word in the parallel corpus either by their part-of-speech tags or word classes after clustering using the Brown clustering algorithm. Both of these approaches only take into account words that are direct translations of each other as they only depend on word alignments between the source word and target word in the bitokens. In this thesis, we propose the use of bilingual word embeddings as a first step to reduce the vocabulary of the bitokens. Bilingual word embeddings are a low dimensional representation of words trained on a parallel corpus of aligned sentences in two languages. Using bilingual word embeddings to build Bi-LMs for machine translation is significantly better than the previous state of the art that uses Bi-LMs with an increase of 1.4 BLEU points in our experiments.

Document type: 
Thesis
File(s): 
Senior supervisor: 
Anoop Sarkar
Fred Popowich
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.