Enhancing utility in privacy preserving data publishing

Resource type
Thesis type
(Thesis) Ph.D.
Date created
In this age of universal information sharing, there is indeed no time at which information security and privacy protection does not matter. Information security mainly concerns the prevention of disclosure of data to unauthorized parties while privacy protection aims at protecting the disclosed data from unintended use, in particular, from inference about sensitive information of involved individuals. This dissertation focuses on privacy preserving data publishing, an important field in privacy protection. Privacy protection, due to its distinctive objective, is more challenging than information security. One challenge that has not been well addressed is how to balance privacy and data utility in privacy preserving data publishing. This dissertation aims to address this challenge by introducing new privacy notions that are more general yet more flexible, and by proposing optimal solutions and efficient algorithms. Specifically, this dissertation makes three efforts, that is, to enhance data utility in preventing privacy attacks specific to relational data, set-valued data, and web search data respectively. First, it proposes the L(+)-diversity notion for relational data, which is more general than the classical l-diversity, together with an optimal algorithm, L(+)-Optimize, which employs a flexible anonymization model. The solution is optimal in data utility, which has a significant gain comparing with a heuristic solution and a solution by a restrictive anonymization model. Second, it proposes the KL(m, n)-privacy notion for set-valued data. While approaches for relational data are not applicable, only a few works are on set-valued data and they provide either over-protection or under-protection. KL(m, n)-privacy with the optimal algorithm presented in the thesis provides sufficient protection and retains enough utility. Third, it proposes the vocabulary k-anonymity notion for web search data, together with a semantic similarity based clustering approach. This approach preserves more data utility than the state of the art approaches.
Copyright statement
Copyright is held by the author.
The author granted permission for the file to be printed and for the text to be copied and pasted.
Scholarly level
Supervisor or Senior Supervisor
Thesis advisor: Wang, Ke
Member of collection
Attachment Size
etd6142_JLiu.pdf 1.22 MB