Clustering-based web query log anonymization

Author: 
Date created: 
2010-11-15
Identifier: 
etd6310
Keywords: 
Query logs data, privacy-preserving data publishing, transaction data anonymization, item generalization
Abstract: 

Web query logs data contain information which can be very useful in research or marketing, however, release of such data can seriously breach the privacy of search engine users. These privacy concerns go far beyond just the identifying information in a query such as name, address, and etc., which can refer to a particular individual. It has been shown that even non-identifying personal data can be combined with external publicly available information and pinpoint to an individual as this happened after AOL query logs release in 2006. In this work we model web query logs as unstructured transaction data and present a novel transaction anonymization technique based on clustering and generalization techniques to achieve the k-anonymity privacy. We conduct extensive experiments on the AOL query log data. Our results show that this method results in a higher data utility compared to the state of-the-art transaction anonymization methods.

Document type: 
Thesis
Rights: 
Copyright remains with the author. The author granted permission for the file to be printed and for the text to be copied and pasted.
File(s): 
Senior supervisor: 
Ke Wang
Department: 
Applied Science: School of Computing Science
Thesis type: 
((Computing Science) Thesis) M.Sc.
Statistics: