Differentially private machine learning for decentralized and time-evolving data

Resource type
Thesis type
(Thesis) Ph.D.
Date created
Decentralized machine learning focuses on learning from data distributed at multiple related sites, where due to privacy or regulatory concerns, data pooling is not an option. Examples include electronic health records from multiple hospitals, credit card transactions from multiple financial institutions, etc. In contrast to the real-world requirements, current methods in decentralized machine learning, notably federated learning, force participating sites into tight collaboration where the sites are forced into symmetric sharing and shared decision making. That is, all sites have to contribute their data to benefit from the learning process, and have to share the same model types, architectures, training methodologies, feature and sample spaces, etc. The issues are compounded in the case of privacy-preservation and time-evolving data streams, where the sites have to agree on a common, one-size-fits-all privacy budget, and the continuous model updates required for handling time-evolving data streams erode the privacy budget, deteriorating utility. Forced tight collaboration creates barriers to participation where participating sites want to benefit from other sites' data but do not wish to share their own information or change the existing data analysis practices. In this thesis, we propose an end-to-end solution for differentially private decentralized learning. Where our first contribution is PubSub-ML, a differentially private, decentralized learning framework under loose collaboration for static data. Proposed as an alternative to federated learning, PubSub-ML allows the participating sites to maintain autonomy on all decisions related to their learning processes. Our second contribution is DP-Ensemble, a differentially private, dynamic model integration approach for a single site that allows unlimited model updates for time-evolving data streams on a fixed privacy budget. Our third contribution extends PubSub-ML to data streams using DP-Ensemble, allowing differentially private, decentralized modeling of data streams under loose collaboration and a fixed privacy budget. The utility of our contributions hinges on the quality of their building blocks, the individual models. Our fourth and fifth contributions are high-utility differentially private and non-private models for a single site. All contributions are supported by extensive empirical evaluation.
144 pages.
Copyright statement
Copyright is held by the author(s).
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Wang, Ke
Member of collection
Attachment Size
etd22232.pdf 3.18 MB