An intermediate representation for transforming and optimizing the microarchitecture of application accelerators

Author: 
Date created: 
2020-07-27
Identifier: 
etd21042
Keywords: 
Accelerator design
High-Level Synthesis
Configurable architecture
FPGA
Abstract: 

In recent years, the computing landscape has seen a shift towards specialized accelerators since the scaling of computational capacity is no longer guaranteed for every technology generation. Reconfigurable architectures like FPGAs are promising for accelerator implementation. FPGAs allow implementation of arbitrary logic functions for different classes of applications with better performance per watt over CPUs and GPUs without re-spinning the circuits like fixed-function ASICs. Unfortunately, the software programmer community has stayed away from this technology, primarily because of the abstraction gap exists between software languages and hardware design. Hardware description languages (HDLs) are very low-level, and a hardware designer should think about the design in terms of low-level building blocks such as gates and registers. The alternative to HDLs is High-level synthesis (HLS) tools. HLS frameworks synthesize hardware from a high-level description of an algorithm in the form of untimed mathematical expressions and nested, pipeline and parallel loops in software languages. The primary limitation of HLS is that the functionality and microarchitecture are conflated together in a single language. As a result, making changes to the accelerator design requires code restructuring and microarchitecture optimizations tied by program correctness. In this thesis we propose two new abstractions to decouple functionality from microarchitecture. The first abstraction is a hierarchical intermediate representation for describing parameterized accelerator microarchitecture, targeting reconfigurable architects. In this abstraction, we repre- sent the accelerator as a concurrent structural graph in which components roughly correspond to microarchitecture level hardware blocks. We describe the methods we used to lower the en- tire application graph into a parameterized intermediate hardware abstraction, μIR. We describe the implementation of this intermediate abstraction and an associated pass framework, μopt. We then discuss some of the compiler optimizations that μIR enables, including timing, spatial, and higher-order. The final system is a compiler stack that can take a high-level program as input and translate it into optimized, synthesizable hardware design. The second abstraction is a sequence of instructions that convey the producer-consumer locality between dependent instructions. We show that this new abstraction adapts to different acceleration behaviors by varying the length and the types of fused instructions.

Document type: 
Thesis
Rights: 
This thesis may be printed or downloaded for non-commercial research and scholarly purposes. Copyright remains with the author.
File(s): 
Supervisor(s): 
Arrvindh Shriraman
Department: 
Applied Sciences: School of Computing Science
Thesis type: 
(Thesis) Ph.D.
Statistics: