Phelan, Gabriel

Resource type

Graduating extended essay / Research project

Thesis type

(Project) M.Sc.

Date created

2020-08-18

Authors/Contributors

Author: Phelan, Gabriel

Abstract

Probabilistic topic models (PTM's) have become a ubiquitous approach for finding a set of latent themes (``topics'') in collections of unstructured text. A simpler, linear algebraic technique for the same problem is non-negative matrix factorization: we are given a matrix with non-negative entries and asked to find a pair of low-rank matrices, also non-negative, whose product is approximately the original matrix. A drawback of NMF is the non-convex nature of the optimization problem it poses. Recent work by the theoretical computer science community addresses this issue, utilizing NMF's inherent structure to find conditions under which the objective function admits convexity. With convexity comes tractability, and the central theme of this thesis is the exploitation of this tractability to ally NMF with resampling-based nonparametrics. Our motivating example is one in which a document collection exhibits some kind of partitioning according to a discrete, indexical covariate, and the goal is to assess the influence of this partitioning on document content; we call this scenario a grouped topic model. Computation relies on several well-studied tools from numerical linear algebra and convex programming which are especially well suited for synthesis with permutation tests and the bootstrap. The result is a set of simple, fast, and easily implementable methodologies for performing inference in grouped topic models. This is contrast to parallel developments in PTM's where ever-more cumbersome inference schemes are required to fit complex graphical models.

Keywords

Identifier

etd21047

Copyright statement

Copyright is held by the author(s).

Permissions

This thesis may be printed or downloaded for non-commercial research and scholarly purposes.

Supervisor or Senior Supervisor

Thesis advisor: Campbell, David

Language

English

Member of collection

Statistics and Actuarial Science Theses

Download file	Size
input_data\20921\etd21047.pdf	692.47 KB

New perspectives on non-negative matrix factorization for grouped topic models

Keywords

Views & downloads - as of June 2023