Skip to main content

Differentially private machine learning for decentralized and time-evolving data

Resource type
Thesis type
(Thesis) Ph.D.
Date created
2022-12-06
Authors/Contributors
Abstract
Decentralized machine learning focuses on learning from data distributed at multiple related sites, where due to privacy or regulatory concerns, data pooling is not an option. Examples include electronic health records from multiple hospitals, credit card transactions from multiple financial institutions, etc. In contrast to the real-world requirements, current methods in decentralized machine learning, notably federated learning, force participating sites into tight collaboration where the sites are forced into symmetric sharing and shared decision making. That is, all sites have to contribute their data to benefit from the learning process, and have to share the same model types, architectures, training methodologies, feature and sample spaces, etc. The issues are compounded in the case of privacy-preservation and time-evolving data streams, where the sites have to agree on a common, one-size-fits-all privacy budget, and the continuous model updates required for handling time-evolving data streams erode the privacy budget, deteriorating utility. Forced tight collaboration creates barriers to participation where participating sites want to benefit from other sites' data but do not wish to share their own information or change the existing data analysis practices. In this thesis, we propose an end-to-end solution for differentially private decentralized learning. Where our first contribution is PubSub-ML, a differentially private, decentralized learning framework under loose collaboration for static data. Proposed as an alternative to federated learning, PubSub-ML allows the participating sites to maintain autonomy on all decisions related to their learning processes. Our second contribution is DP-Ensemble, a differentially private, dynamic model integration approach for a single site that allows unlimited model updates for time-evolving data streams on a fixed privacy budget. Our third contribution extends PubSub-ML to data streams using DP-Ensemble, allowing differentially private, decentralized modeling of data streams under loose collaboration and a fixed privacy budget. The utility of our contributions hinges on the quality of their building blocks, the individual models. Our fourth and fifth contributions are high-utility differentially private and non-private models for a single site. All contributions are supported by extensive empirical evaluation.
Document
Extent
144 pages.
Identifier
etd22232
Copyright statement
Copyright is held by the author(s).
Permissions
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Wang, Ke
Language
English
Member of collection
Download file Size
etd22232.pdf 3.18 MB

Views & downloads - as of June 2023

Views: 75
Downloads: 7