Publishing data without revealing the sensitive information about individuals is an important issue in the field of computer science. In recent years, there are several methods widely used to protect people’s privacy: generalization, bucketization and randomization. In this thesis, we begin with giving definition of several well-known privacy protection notions: k-anonymity, l-diversity and t-closeness, and discussing their three major drawbacks, namely, 1) the lack of flexibility for handling different types of variable sensitivity; 2) the large loss of information utility; 3) the vulnerability to auxiliary information. We then propose a new approach by generating the multiple-sized buckets to offer a better protection of individual privacy. This approach also has a higher information utility without violating personal privacy. We design two pruning algorithms for two-sized bucketing: lose-based pruning and privacy-based pruning. Both of them make the two-sized bucketing algorithm perform efficiently for the real data. We also implement a recursive algorithm to test our multiple size bucketing approach. Finally, we apply it to the empirical studies on the real data to demonstrate its effectiveness.
Copyright is held by the author.
The author granted permission for the file to be printed, but not for the text to be copied and pasted.
Supervisor or Senior Supervisor
Thesis advisor: Wang, Ke
Member of collection