Lin, Zhenhua

Resource type

Thesis

Thesis type

(Thesis) M.Sc.

Date created

2010

Authors/Contributors

Author: Lin, Zhenhua

Abstract

How can we maintain a dynamic profile capturing a user’s reading interest against the common interest? What are the queries that have been asked 1,000 times more frequently to a search engine from users in Asia than in Europe? To answer such interesting questions, we need to find discriminative items in multiple data streams. In this thesis, we show that, to exactly find all discriminative items in stream S1 against stream S2 by one scan, the space lower bound is O(|S| log (n1/|S|)), where S is the alphabet of items and n1 is the current size of S1. To tackle the space challenge, we develop three heuristic algorithms that can achieve high precision and recall using sub-linear space and processing time per item with respect to |S|. The complexity of all algorithms is independent to the size of the two streams. The extensive empirical study verifies our design.

Keywords

Copyright statement

Copyright is held by the author.

Permissions

The author has not granted permission for the file to be printed nor for the text to be copied and pasted. If you would like a printable copy of this thesis, please contact summit-permissions@sfu.ca.

Scholarly level

Graduate student (Masters)

Language

English

Member of collection

Computing Science Theses

Download file	Size
etd5906.pdf	555.8 KB

Mining discriminative items in multiple data streams

Keywords

Views & downloads - as of June 2023