Resource type
Thesis type
(Thesis) Ph.D.
Date created
2024-02-28
Authors/Contributors
Author: Lu, Fangzhou (Alec)
Abstract
With the ever-increasing amount of user data produced worldwide, today's big data ana- lytics engines are constantly under pressure to keep up with the rapidly increasing demand for faster processing of more complex workloads. Over the years, FPGA-based hardware accelerators have demonstrated to bring significant performance improvement at a great energy efficiency for these data-intensive workloads. However, for the vast majority of soft- ware programmers who use high-level synthesis (HLS), it is nontrivial to develop efficient FPGA accelerators and to automatically leverage them in real-world analytics workloads. In this thesis, we adopt a pragmatic approach in exploring hardware acceleration for big data analytics workloads. As the first step, we want to understand how software programmers could develop perfor- mance efficient FPGA accelerators. While the computation customization is well analyzed in prior studies, we find that the efficient (off-chip) memory access customization is often overlooked. To investigate the efficient memory access in HLS-based accelerator designs on modern FPGAs, we first identify five key factors, including 1) the clock frequency of the ac- celerator design, 2) the number of concurrent memory access ports, 3) the data width of each port, 4) the maximum burst access length for each port, and 5) the size of consecutive data accesses. Then we develop a set of HLS-C/C++ based microbenchmarks to quantitatively evaluate the effective bandwidth, latency, and resource usage of the off-chip memory access and the accelerator-to-accelerator streaming on modern FPGAs. Based on the quantitative evaluation, we further derive numerous insights into efficient memory access for software programmers using HLS. In the second work, we conduct a case study to demonstrate the promise of software-defined FPGA accelerators for vector databases. In a vector database, it represents unstructured data (such as image and video data) as a vector. One of the key components in a vec- tor database is to query the K-Nearest Neighbors (KNN)—i.e., k most similar results—for a given query. In this work, we develop an automation framework for generating high- performance KNN accelerators on cloud FPGAs, by leveraging the derived memory insights, as well as data parallelism and pipeline parallelism in the computation customization. Com- pared to an optimized 16-thread CPU implementation, our designs, under a comprehensive set of configurations, achieve an average of 7.5x and 19.8x speedup on the AMD/Xilinx Alveo U200 and U280 FPGAs, respectively. Lastly, to manifest a broader impact on more big data analytics workloads, we also develop SQL2FPGA, an accelerator-aware compiler to automatically map SQL queries onto the heterogeneous CPU-FPGA platforms to achieve the accelerated performance. The current SQL2FPGA prototype is built based on the widely used Apache Spark SQL framework. First, the front-end of SQL2FPGA takes an optimized logical plan of a SQL query from a database query engine and transforms it into a unified operator-level intermediate repre- sentation. Then, to generate an optimized FPGA-aware physical plan, SQL2FPGA applies a set of compiler optimization passes to 1) improve operator acceleration coverage by the FPGA, 2) eliminate redundant computation during physical execution, and 3) minimize data transfer overhead between operators on the CPU and FPGA. Finally, SQL2FPGA generates the associated query acceleration code for heterogeneous CPU-FPGA system de- ployment. Compared to Spark SQL running on the CPU, our compilation framework—using two AMD/Xilinx Alveo U280 FPGA boards—achieves an average performance speedup of 10.1x and 13.9x across all 22 TPC-H benchmark queries in a scale factor of 1GB and 30GB, respectively.
Document
Extent
155 pages.
Identifier
etd22921
Copyright statement
Copyright is held by the author(s).
Supervisor or Senior Supervisor
Thesis advisor: Fang, Zhenman
Language
English
Member of collection
Download file | Size |
---|---|
etd22921.pdf | 14.78 MB |