Resource type
Thesis type
(Thesis) Ph.D.
Date created
2022-07-07
Authors/Contributors
Author: Cong, Zicun
Abstract
Data analytic algorithms are transforming every aspect of our lives, through their applications to decision making in a wide spectrum of areas. The lack of trustworthiness in data analytic algorithms raises growing concerns, as untrustworthy analytic algorithms may make unfair, insecure, and not explainable decisions, which can harm society and individuals. To ensure the development and deployment of data analytic algorithms are beneficial to humans, it is essential to ensure trustworthiness in the current and future practices of data analytics. Although a series of works have been proposed for trustworthy data analytics, most of the existing studies sacrifice considerable utility to achieve trustworthiness. Developing data analytic algorithms with a good trustworthiness-utility tradeoff remains a challenging area. Interpretability and fairness are two desiderata of trustworthy data analytics. In this thesis, we develop efficient algorithmic tools to tackle two crucial interpretation problems and one fairness problem in data analytics. In particular, we first discuss how to compute exact and consistent interpretations on piecewise linear models hidden behind APIs. The family of piecewise linear models includes many popular classification models, such as neural networks with ReLU family as activation functions. Then, we investigate how to efficiently compute comprehensible counterfactual explanations for the Kolmogorov-Smirnov test. The Kolmogorov-Smirnov test is a well-known statistical hypothesis test that has been popularly used to detect changes and abnormalities. Last, we develop a sampling framework to efficiently train fair and accurate graph neural networks. Graph neural network is the state-of-the-art analytic algorithm for many graph analytic tasks. Our work provides powerful algorithmic tools to solve the aforementioned interpretation and fairness problems, which achieve a superior trustworthiness-utility tradeoff. We conclude this thesis by discussing some future directions in trustworthy data analytics.
Document
Extent
149 pages.
Identifier
etd22005
Copyright statement
Copyright is held by the author(s).
Supervisor or Senior Supervisor
Thesis advisor: Pei, Jian
Thesis advisor: Wang, Jiannan
Language
English
Member of collection
Download file | Size |
---|---|
etd22005.pdf | 2.69 MB |