Integer Computations with Soft GPGPU on FPGAs
Muhammed Soubhi Al Kadi, Michael Hübner
2016 International Conference on Field-Programmable Technology (FPT '16)
This paper explores the capabilities and limitations of soft GPGPU-based computing on fixed-point arithmetic. The work is based on an existing soft GPU architecture which has been improved and extended to cover broader benchmarks. The ISA of the enhanced architecture supports conditional instructions and global atomic operations. We extended the tool flow with an LLVM-Backend and used the clang frontend to provide an OpenCL compiler. The improved architecture is evaluated against multiple other solutions: a single MicroBlaze soft processor, a Cortex-A9 ARM with the NEON vector coprocessor and equivalent HLS implementations. We have recorded an average speed up of 10-47x over the MicroBlaze and 0.9- 4.6x over the ARM with the NEON engine for the smallest and the biggest soft GPU cores, respectively. Although these cores have an area overhead of 6-22x in comparison to the single soft processor solution, they consumed in average 2.8-7.1x less energy to perform the same tasks. We noticed no performance degradation in comparison to the HLS implementations.