Designing an IEEE Floating-Point Unit with configurable compliance support and precision for FPGA-based soft-processors

Resource type
Thesis type
(Thesis) M.A.Sc.
Date created
Author: Gao, Yuhui
Field Programmable Gate Arrays (FPGAs) are commonly used to accelerate floating-point applications. The advancements in FPGA technology and the introduction of the RISC-V Instruction Set Architecture (ISA) have collectively enabled a number of soft-processor designs. Although researchers have extensively studied FPGA- based floating-point implementations, existing work has largely focused on standalone, and frequency-optimized data-path designs. They are not suitable for soft-processors targeting FPGAs due to the units' long latency, and soft-processors' innate frequency ceiling. Furthermore, the few existing integrated Floating Point Unit (FPU) hardware implementations targeting FPGA-based soft-processors are not IEEE 754 compliant. We present a floating-point unit for FPGA-based RISC-V soft-processors that is fully IEEE compliant and configurable. Our design focuses on maximizing runtime performance with efficient resource utilization. We allow the users to configure the FPU to four varying levels of compliance, or to select reduced precision configurations. Benchmarking against a set of real-world floating-point applications, we evaluate the FPU variants in term of resource usage, operating frequency, runtime performance, and performance efficiency. We also present trade-off analyses of two microarchitecture design choices. Our fully compliant FPU uses 5423 Look-Up Tables (LUTs), and achieves an operating frequency of 105 MHz. The key results from our work demonstrate the effect of running floating-point workloads using reduced compliance FPUs. Our experimentation shows that decreasing the Fused Multiply-Add (FMA)'s intermediate representation leads to a 25% reduction in LUT usage that translates to an average 46% increase in performance- efficiency. Additionally, disabling denormal support reduces resource utilization by 10% and improves the clock frequency by 6%, which results in a 14% higher performance efficiency, while having no impact on the result accuracy for our benchmark applications. Furthermore, we find that running applications in reduced precision can improve runtime performance by up to 75%, although applications may suffer from significant loss of precision.
85 pages.
Copyright statement
Copyright is held by the author(s).
This thesis may be printed or downloaded for non-commercial research and scholarly purposes.
Supervisor or Senior Supervisor
Thesis advisor: Shannon, Lesley
Member of collection
Attachment Size
etd22314.pdf 1.54 MB