Web query logs data contain information which can be very useful in research or marketing, however, release of such data can seriously breach the privacy of search engine users. These privacy concerns go far beyond just the identifying information in a query such as name, address, and etc., which can refer to a particular individual. It has been shown that even non-identifying personal data can be combined with external publicly available information and pinpoint to an individual as this happened after AOL query logs release in 2006. In this work we model web query logs as unstructured transaction data and present a novel transaction anonymization technique based on clustering and generalization techniques to achieve the k-anonymity privacy. We conduct extensive experiments on the AOL query log data. Our results show that this method results in a higher data utility compared to the state of-the-art transaction anonymization methods.
Copyright is held by the author.
The author granted permission for the file to be printed and for the text to be copied and pasted.
Supervisor or Senior Supervisor
Thesis advisor: Wang, Ke
Member of collection