Design of an adaptive reconfigurable general-purpose floating-point accelerator
-
摘要:
用加速器因其专业性过强往往缺乏一定的灵活性,在处理不同类型的应用时不可避免的导致能效比的下降.本文设计了一款自适应可重构浮点加速器,它可以根据计算任务需求和可重构计算资源使用情况,在运行时将计算任务映射到可重构计算资源,具有自适应可重构的能力.该浮点加速器整体采用“RISC-V+可重构浮点运算单元”的架构,可重构浮点运算单元由一系列粗粒度浮点运算器构成,负责具体的浮点计算.该设计在Xilinx Ultrascale XCVU440FPGA芯片上进行了原型验证,结果表明,该浮点加速器具有较广的应用面,运算效率高,算法适应性强.
Abstract:Application-specific accelerators often lack flexibility due to their professionalism, which inevitably leads to the decrease ofenergy efficiency during the execution of a wide range of applications. In this paper, an adaptive reconfigurable floating-point accelerator is designed, which can automatically map computing tasks to reconfigurable computing resources at runtime basing on computing task requirements and the status of reconfigurable computing resources. The floating-point accelerator uses the architecture of "RISC-V + Reconfigurable Floating-point Arithmetic Unit". The reconfigurable floating-point arithmetic unit consists of a series of coarse-grained floating-point arithmetic units, which are responsible for specific floating-point calculations.The design has been prototypical verified on the Xilinx Ultrascale XCVU440 FPGA chip, and the results show that the floating-point accelerator has wider applicability, high operation efficiency, and high algorithm adaptability.
-
Key words:
- reconfiguration /
- floating-point accelerator /
- RISC-V /
- high-density computing /
- FPGA
-
表 1 加速器资源消耗
name CLB LUTs Block RAM DSPs CLB RISC-V 1 158 4 0 254 RFU 9 318 0 14 1 665 Data memory 1 638 16 0 331 Controller 1 118 0 0 421 LOAD/STORE 121 2 0 122 RGFA 13 341 22 14 2 386 表 2 不同应用的执行周期
应用类型 Cortex-A9 浮点加速器(8个基础运算单元) 浮点加速器(16个基础运算单元) Complex_mul_1K 43 205 1 437 744 Dot_product_1K 14 049 7 802 4 493 Matrix_mul_64x64 5 959 203 1 163 264 678 384 FFT_8_1K 88 236 21 482 16 712 Jacobi_16x16 3 680 273 1 523 465 1 274 406 -
[1] CHEN Z Y, ZHOU H, GU J.R-accelerator: an RRAM-based CGRA accelerator with logic contraction[J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2019, 27(11): 2655-2667. DOI: 10.1109/TVLSI.2019.2925937. [2] WIJTVLIET M, WAEIJEN L, CORPORAAL H.Coarse grained reconfigurable architectures in the past 25 years: overview and classification[C]//Proceedings of2016 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation. Agios Konstantinos, Greece: IEEE, 2016: 235-244. DOI: 10.1109/SAMOS.2016.7818353. [3] SINGH H, LEE M H, LU G M, et al. MorphoSys: an integrated reconfigurable system for data-parallel and computation-intensive applications[J]. IEEE Transactions on Computers, 2000, 49(5): 465-481. DOI: 10.1109/12.859540. [4] MEI B F, VERNALDE S, VERKEST D, et al. ADRES: an architecture with tightly coupled VLIW processor and coarse-grained reconfigurable matrix[M]//CHEUNGP Y K, CONSTANTINIDES G A.Field Programmable Logic and Application. Berlin, Heidelberg: Springer, 2003: 61. DOI: 10.1007/b12007. [5] HARBAUM T, SCHADE C, DAMSCHEN M, et al.Auto-SI: an adaptive reconfigurable processor with run-time loop detection and acceleration[C]//Proceedings of the 2017 30th IEEE International System-on-Chip Conference. Munich, Germany: IEEE, 2017: 153-158. DOI: 10.1109/SOCC.2017.8226027. [6] BECK A C S, RUTZIG M B, CARRO L.A transparent and adaptive reconfigurable system[J]. Microprocessors and Microsystems, 2014, 38(5): 509-524. DOI: 10.1016/j.micpro.2014.03.004. -