How can we maintain a dynamic profile capturing a user’s reading interest against the common interest? What are the queries that have been asked 1,000 times more frequently to a search engine from users in Asia than in Europe? To answer such interesting questions, we need to find discriminative items in multiple data streams. In this thesis, we show that, to exactly find all discriminative items in stream S1 against stream S2 by one scan, the space lower bound is O(|S| log (n1/|S|)), where S is the alphabet of items and n1 is the current size of S1. To tackle the space challenge, we develop three heuristic algorithms that can achieve high precision and recall using sub-linear space and processing time per item with respect to |S|. The complexity of all algorithms is independent to the size of the two streams. The extensive empirical study verifies our design.
Copyright is held by the author.
The author has not granted permission for the file to be printed nor for the text to be copied and pasted. If you would like a printable copy of this thesis, please contact email@example.com.
Member of collection