Han, Chao

Resource type

Thesis

Thesis type

(Thesis) Ph.D.

Date created

2016-01-15

Authors/Contributors

Author: Han, Chao

Abstract

Most syntactic methods consider non-independent reasoning (NIR) as a privacy violation and smooth the distribution of published data to avoid sensitive NIR, where NIR allows the information about one record in the data could be learned from the information of other records in the data. The drawback of this approach is that it limits the utility of learning statistical relationships. The differential privacy criterion considers NIR as a non-privacy violation, therefore, enables learning statistical relationships, but at the cost of potential disclosures through NIR. In this thesis, we investigate the extent to which private information of an individual may be disclosed through NIR by query answers that satisfy differential privacy. We first define what a disclosure of NIR means by randomized query answers, then present a formal analysis on such disclosures by differentially private query answers. Our analysis on real life data sets demonstrates that while disclosures of NIR can be eliminated by adopting a more restricted setting of differential privacy, such settings adversely affects the utility of query answers for data analysis, and this conflict can not be easily resolved because both disclosures and utility depend on the accuracy of noisy query answers. This study suggests that under the assumption that the disclosure through NIR is a privacy concern, differential privacy is not suitable because it does not provide both privacy and utility. The question is whether it is possible to (1) allow learning statistical relationships, yet (2) prevent sensitive NIR about an individual. In the second part of the thesis, we present a data perturbation and sampling method to achieve both (1) and (2). The enabling mechanism is a new privacy criterion that distinguishes the two types of NIR in (1) and (2) with the help of the law of large numbers. In particular, the record sampling effectively prevents the sensitive disclosure in (2) while having less effect on the statistical learning in (1). The data perturbation and sampling method are evaluated in real life data sets in terms of both sensitive disclosures and utility. Empirical results confirm that disclosures can be prevented with minor loss of utility.

Keywords

Identifier

etd9475

Copyright statement

Copyright is held by the author.

Permissions

This thesis may be printed or downloaded for non-commercial research and scholarly purposes.

Scholarly level

Graduate student (PhD)

Supervisor or Senior Supervisor

Thesis advisor: Wang, Ke

Member of collection

Computing Science Theses

Download file	Size
etd9475_CHan.pdf	675.84 KB

Sensitive disclosures under differential privacy guarantees

Keywords

Views & downloads - as of June 2023