WU Xifeng, Gong Jie, Fan Jun, He Hu. Accelerating RNNs on FPGA with HBM[J]. Microelectronics & Computer, 2021, 38(10): 79-84. DOI: 10.19304/J.ISSN1000-7180.2021.002310.19304/J.ISSN1000-7180.2021.0023
Citation: WU Xifeng, Gong Jie, Fan Jun, He Hu. Accelerating RNNs on FPGA with HBM[J]. Microelectronics & Computer, 2021, 38(10): 79-84. DOI: 10.19304/J.ISSN1000-7180.2021.002310.19304/J.ISSN1000-7180.2021.0023

Accelerating RNNs on FPGA with HBM

  • Aiming at the problem that the algorithm of the recurrent neural network is limited by bandwidth, an accelerated SoC based on HBM is designed, which can universally support the RNN and its variants. First, the structure of RNN and its variants, and the calculation requirements and storage requirements of the algorithms., a high-bandwidth accelerator design based on HBM was proposed and deployed on the Xilinx VCU128 development board. Finally, according to the Roofline model analysis method, the bandwidth and calculation density are imprved. The average inference performance of testing DeepSpeech2 and GNMT algorithms are 61.74 GFLOPs/sec and 20GFLOPs/sec respectively. Compared with the design based on DDR memory, the performance is improved by 3.68 times. Compared with the accelerated design of other floating-point 32-bit FPGA-based recurrent neural networks, the performance is improved by 8.5 times. This design proposes a data scheduling method for multi-channel memory and can adapt to different recurrent neural network applications.
  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return