Skip to main content

Accelerating human-in-the-loop data analytics

Resource type
Thesis type
(Thesis) Ph.D.
Date created
2022-12-01
Authors/Contributors
Abstract
Data analytics is essential to enable data-driven decision-making. While batch analytics is often run offline and can take several hours or even days to generate results, human-in-the-loop analytics requires a fast response. However, it is still a challenging problem to accelerate human-in-the-loop data analytics. The challenge comes from both machine and human sides. From the machine side, there is a gap between the massive volume of processed data and limited hardware resources, which is constrained by practical considerations like price. From the human side, a gap exists between the little human attention and the enormous details that the attention needs to be paid to finish a task. In this thesis, we develop several systems to accelerate human-in-the-loop data analytics from both machine and human sides. The thesis contains two parts. In the first part of the thesis, we present two systems (AQP++ and SamComb) to accelerate machine processing. In order to achieve interactive response time, our key idea is to reduce the data that needs to be processed by the machine. We focus on the online analytical processing (OLAP) scenario and leverage sampling-based approximate query processing (AQP) techniques to reduce the data. An AQP system can return an approximate query result in a short time. To improve the estimation quality, we propose to combine different data summaries: In AQP++, we combine samples with pre-computed aggregations; In SamComb, we combine different types of samplers. In the second part of the thesis, we present one system to accelerate human analytics. We focus on the exploratory data analysis (EDA) scenario and propose a task-centric EDA system named DataPrep.EDA. DataPrep.EDA allows data scientists to declaratively specify a wide range of EDA tasks with a single function call. In this way, humans can pay more attention to deciding the task to perform, and the system will handle the implementation details automatically.
Document
Extent
118 pages.
Identifier
etd22239
Copyright statement
Copyright is held by the author(s).
Permissions
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Wang, Jiannan
Language
English
Member of collection
Download file Size
etd22239.pdf 2.71 MB

Views & downloads - as of June 2023

Views: 34
Downloads: 2