Activity monitoring using topic models

Date created: 
Non-parametric Bayesian Topic Models
Dirichlet Process Mixture Model
Anomaly Detection
High-Dimensional Categorical Data

Activity monitoring is the task of continual observation of a stream of events which necessitates the immediate detection of anomalies based on a short window of data. For many types of categorical data, such as zip codes and phone numbers, thousands of unique attribute values lead to a sparse frequency vector. This vector is then unlikely to be similar to the frequency vector obtained from the training set collected from a longer period of time. In this work, using topic models, we present a method for dimensionality reduction which can detect anomalous windows of categorical data with a low rate of false positives. We apply nonparametric Bayesian topic models to address the variable nature of data, which allows for updating the model parameters during the continual observation to capture gradual changes of the user behavior. Our experiments on several real-life datasets show that our proposed model outperforms state-of-the-art methods for activity monitoring in categorical data with large domains of attribute values.

Document type: 
This thesis may be printed or downloaded for non-commercial research and scholarly purposes. Copyright remains with the author.
Senior supervisor: 
Martin Ester
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) M.Sc.