Towards trustworthy data analytics: Algorithmic tools for interpretability and fairness

Resource type
Thesis type
(Thesis) Ph.D.
Date created
Author: Cong, Zicun
Data analytic algorithms are transforming every aspect of our lives, through their applications to decision making in a wide spectrum of areas. The lack of trustworthiness in data analytic algorithms raises growing concerns, as untrustworthy analytic algorithms may make unfair, insecure, and not explainable decisions, which can harm society and individuals. To ensure the development and deployment of data analytic algorithms are beneficial to humans, it is essential to ensure trustworthiness in the current and future practices of data analytics. Although a series of works have been proposed for trustworthy data analytics, most of the existing studies sacrifice considerable utility to achieve trustworthiness. Developing data analytic algorithms with a good trustworthiness-utility tradeoff remains a challenging area. Interpretability and fairness are two desiderata of trustworthy data analytics. In this thesis, we develop efficient algorithmic tools to tackle two crucial interpretation problems and one fairness problem in data analytics. In particular, we first discuss how to compute exact and consistent interpretations on piecewise linear models hidden behind APIs. The family of piecewise linear models includes many popular classification models, such as neural networks with ReLU family as activation functions. Then, we investigate how to efficiently compute comprehensible counterfactual explanations for the Kolmogorov-Smirnov test. The Kolmogorov-Smirnov test is a well-known statistical hypothesis test that has been popularly used to detect changes and abnormalities. Last, we develop a sampling framework to efficiently train fair and accurate graph neural networks. Graph neural network is the state-of-the-art analytic algorithm for many graph analytic tasks. Our work provides powerful algorithmic tools to solve the aforementioned interpretation and fairness problems, which achieve a superior trustworthiness-utility tradeoff. We conclude this thesis by discussing some future directions in trustworthy data analytics.
149 pages.
Copyright statement
Copyright is held by the author(s).
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Pei, Jian
Thesis advisor: Wang, Jiannan
Member of collection
Attachment Size
etd22005.pdf 2.69 MB