Resource type
Thesis type
(Thesis) M.Sc.
Date created
2017-03-29
Authors/Contributors
Author: Nabaei, Boshra
Abstract
Activity monitoring is the task of continual observation of a stream of events which necessitates the immediate detection of anomalies based on a short window of data. For many types of categorical data, such as zip codes and phone numbers, thousands of unique attribute values lead to a sparse frequency vector. This vector is then unlikely to be similar to the frequency vector obtained from the training set collected from a longer period of time. In this work, using topic models, we present a method for dimensionality reduction which can detect anomalous windows of categorical data with a low rate of false positives. We apply nonparametric Bayesian topic models to address the variable nature of data, which allows for updating the model parameters during the continual observation to capture gradual changes of the user behavior. Our experiments on several real-life datasets show that our proposed model outperforms state-of-the-art methods for activity monitoring in categorical data with large domains of attribute values.
Document
Identifier
etd10041
Copyright statement
Copyright is held by the author.
Scholarly level
Supervisor or Senior Supervisor
Thesis advisor: Ester, Martin
Member of collection
Download file | Size |
---|---|
etd10041_BNabaei.pdf | 3.5 MB |