Mining discriminative items in multiple data streams

Resource type
Thesis type
(Thesis) M.Sc.
Date created
2010
Authors/Contributors
Author: Lin, Zhenhua
Abstract
How can we maintain a dynamic profile capturing a user’s reading interest against the common interest? What are the queries that have been asked 1,000 times more frequently to a search engine from users in Asia than in Europe? To answer such interesting questions, we need to find discriminative items in multiple data streams. In this thesis, we show that, to exactly find all discriminative items in stream S1 against stream S2 by one scan, the space lower bound is O(|S| log (n1/|S|)), where S is the alphabet of items and n1 is the current size of S1. To tackle the space challenge, we develop three heuristic algorithms that can achieve high precision and recall using sub-linear space and processing time per item with respect to |S|. The complexity of all algorithms is independent to the size of the two streams. The extensive empirical study verifies our design.
Document
Copyright statement
Copyright is held by the author.
Permissions
The author has not granted permission for the file to be printed nor for the text to be copied and pasted. If you would like a printable copy of this thesis, please contact summit-permissions@sfu.ca.
Scholarly level
Language
English
Member of collection
Attachment Size
etd5906.pdf 555.8 KB