Multiple-sized Bucketization For Privacy Protection

Resource type
Thesis type
(Thesis) M.Sc.
Date created
2014-05-05
Authors/Contributors
Author: Wang, Peng
Abstract
Publishing data without revealing the sensitive information about individuals is an important issue in the field of computer science. In recent years, there are several methods widely used to protect people’s privacy: generalization, bucketization and randomization. In this thesis, we begin with giving definition of several well-known privacy protection notions: k-anonymity, l-diversity and t-closeness, and discussing their three major drawbacks, namely, 1) the lack of flexibility for handling different types of variable sensitivity; 2) the large loss of information utility; 3) the vulnerability to auxiliary information. We then propose a new approach by generating the multiple-sized buckets to offer a better protection of individual privacy. This approach also has a higher information utility without violating personal privacy. We design two pruning algorithms for two-sized bucketing: lose-based pruning and privacy-based pruning. Both of them make the two-sized bucketing algorithm perform efficiently for the real data. We also implement a recursive algorithm to test our multiple size bucketing approach. Finally, we apply it to the empirical studies on the real data to demonstrate its effectiveness.
Document
Identifier
etd8398
Copyright statement
Copyright is held by the author.
Permissions
The author granted permission for the file to be printed, but not for the text to be copied and pasted.
Scholarly level
Supervisor or Senior Supervisor
Thesis advisor: Wang, Ke
Member of collection
Attachment Size
etd8398_PWang.pdf 1.19 MB