Despite the vast researches on database similarity search/join, little work has been embarked on the similarity search problem when the query is evolving. How can we select advertisements for users who recently registered at a Q&A website to capture the evolving trend of user interest? How can we continuously recommend songs by user humming? Motivated by the above application scenarios, we study a novel problem of continuous similarity search for evolving queries. Given a set of objects, each being a set or multi-set of items, and a data stream where each element is from the same lexicon as the objects, we want to continuously maintain the top-k most similar objects using the last n items in the stream as an evolving query. We develop a filtering-based method and a MinHash-based method. Our experimental results on real data sets and synthetic data sets show that our methods are effective and efficient.
Copyright is held by the author.
The author granted permission for the file to be printed, but not for the text to be copied and pasted.
Supervisor or Senior Supervisor
Thesis advisor: Pei, Jian
Member of collection