Fung, Benjamin Chin Ming

Resource type

Thesis

Thesis type

(Thesis) M.Sc.

Date created

2002

Authors/Contributors

Author: Fung, Benjamin Chin Ming

Abstract

Most state-of-the art document clustering methods are modifications of traditional clustering algorithms that were originally designed for data tuples in relational or transactional database. However, they become impractical in real-world document clustering which requires special handling for high dimensionality, high volume, and ease of browsing. Furthermore, incorrect estimation of the number of clusters often yields poor clustering accuracy. In this thesis, we propose to use the notion of frequent itemsets, which comes from association rule mining, for document clustering. The intuition of our clustering criterion is that there exist some common words, called frequent itemsets, for each cluster. We use such words to cluster documents and a hierarchical topic tree is then constructed from the clusters. Since we are using frequent itemsets as a preliminary step, the dimension of each document is therefore, drastically reduced, which in turn increases efficiency and scalability.

Copyright statement

Copyright is held by the author.

Permissions

The author has not granted permission for the file to be printed nor for the text to be copied and pasted. If you would like a printable copy of this thesis, please contact summit-permissions@sfu.ca.

Scholarly level

Graduate student (Masters)

Language

English

Member of collection

Computing Science Theses

Download file	Size
b2655937a.pdf	809.04 KB

Hierarchical Document Clustering Using Frequent Itemsets

Views & downloads - as of June 2023