Cong, Zicun

Resource type

Thesis

Thesis type

(Thesis) Ph.D.

Date created

2022-07-07

Authors/Contributors

Author: Cong, Zicun

Abstract

Data analytic algorithms are transforming every aspect of our lives, through their applications to decision making in a wide spectrum of areas. The lack of trustworthiness in data analytic algorithms raises growing concerns, as untrustworthy analytic algorithms may make unfair, insecure, and not explainable decisions, which can harm society and individuals. To ensure the development and deployment of data analytic algorithms are beneficial to humans, it is essential to ensure trustworthiness in the current and future practices of data analytics. Although a series of works have been proposed for trustworthy data analytics, most of the existing studies sacrifice considerable utility to achieve trustworthiness. Developing data analytic algorithms with a good trustworthiness-utility tradeoff remains a challenging area. Interpretability and fairness are two desiderata of trustworthy data analytics. In this thesis, we develop efficient algorithmic tools to tackle two crucial interpretation problems and one fairness problem in data analytics. In particular, we first discuss how to compute exact and consistent interpretations on piecewise linear models hidden behind APIs. The family of piecewise linear models includes many popular classification models, such as neural networks with ReLU family as activation functions. Then, we investigate how to efficiently compute comprehensible counterfactual explanations for the Kolmogorov-Smirnov test. The Kolmogorov-Smirnov test is a well-known statistical hypothesis test that has been popularly used to detect changes and abnormalities. Last, we develop a sampling framework to efficiently train fair and accurate graph neural networks. Graph neural network is the state-of-the-art analytic algorithm for many graph analytic tasks. Our work provides powerful algorithmic tools to solve the aforementioned interpretation and fairness problems, which achieve a superior trustworthiness-utility tradeoff. We conclude this thesis by discussing some future directions in trustworthy data analytics.

Extent

149 pages.

Keywords

Identifier

etd22005

Copyright statement

Copyright is held by the author(s).

Permissions

This thesis may be printed or downloaded for non-commercial research and scholarly purposes.

Supervisor or Senior Supervisor

Thesis advisor: Pei, Jian

Thesis advisor: Wang, Jiannan

Language

English

Member of collection

Computing Science Theses

Download file	Size
etd22005.pdf	2.69 MB

Towards trustworthy data analytics: Algorithmic tools for interpretability and fairness

Keywords

Views & downloads - as of June 2023