Skip to main content

Mining discriminative items in multiple data streams

Resource type
Thesis type
(Thesis) M.Sc.
Date created
Author: Lin, Zhenhua
How can we maintain a dynamic profile capturing a user’s reading interest against the common interest? What are the queries that have been asked 1,000 times more frequently to a search engine from users in Asia than in Europe? To answer such interesting questions, we need to find discriminative items in multiple data streams. In this thesis, we show that, to exactly find all discriminative items in stream S1 against stream S2 by one scan, the space lower bound is O(|S| log (n1/|S|)), where S is the alphabet of items and n1 is the current size of S1. To tackle the space challenge, we develop three heuristic algorithms that can achieve high precision and recall using sub-linear space and processing time per item with respect to |S|. The complexity of all algorithms is independent to the size of the two streams. The extensive empirical study verifies our design.
Copyright statement
Copyright is held by the author.
The author has not granted permission for the file to be printed nor for the text to be copied and pasted. If you would like a printable copy of this thesis, please contact
Scholarly level
Member of collection
Download file Size
etd5906.pdf 555.8 KB

Views & downloads - as of June 2023

Views: 0
Downloads: 0