Wu, Weiyuan

Resource type

Thesis

Thesis type

(Thesis) M.Sc.

Date created

2019-08-22

Authors/Contributors

Author: Wu, Weiyuan

Abstract

Supporting SQL-ML queries in database systems have recently attracted great attention in industry. A SQL-ML query treats ML models as user defined functions and embeds them into a SQL query. Since ML models do not always produce perfect predictions, a user may find the answer to a SQL-ML query different from what she expects and asks the system to provide an explanation. Although SQL-only or ML-only explanation has been well studied in the literature, to the best of our knowledge, we are the first to study the SQL-ML explanation problem. This thesis makes two major contributions. Firstly, we propose a formal definition of the SQL-ML explanation problem. Intuitively, our definition aims to trace the query answer back to the training data and identifies a small number of training examples that have the biggest impact on the query answer. Secondly, we study how to extend existing explanation frameworks and discuss their limitations to solve our problem. To overcome these limitations, we propose InfComp, a novel influence function based approach for SQL-ML explanation. We find that InfComp is a powerful tool to debug training data (i.e., detect corrupted features and mislabeled instances). We conduct extensive experiments using three real applications (Entity Resolution, Image Classification, and Spam Detection), and compare with the state-of-the-art approaches. Results show that InfComp can more accurately identify erroneous training examples than the baselines in an efficient manner.

Keywords

Identifier

etd20476

Copyright statement

Copyright is held by the author.

Permissions

This thesis may be printed or downloaded for non-commercial research and scholarly purposes.

Scholarly level

Graduate student (Masters)

Supervisor or Senior Supervisor

Thesis advisor: Wang, Jiannan

Member of collection

Computing Science Theses

Model

Binary

Language English

Enabling SQL-ML Explanation to Debug Training Data

Keywords

Views & downloads - as of June 2023