QIU Yue, MA Wen-tao, CHAI Zhi-lei. Design and Implementation of a Convolutional Neural Network Accelerator Based on FPGA[J]. Microelectronics & Computer, 2018, 35(8): 68-72, 77.
Citation: QIU Yue, MA Wen-tao, CHAI Zhi-lei. Design and Implementation of a Convolutional Neural Network Accelerator Based on FPGA[J]. Microelectronics & Computer, 2018, 35(8): 68-72, 77.

Design and Implementation of a Convolutional Neural Network Accelerator Based on FPGA

  • In the hardware design of ZynqNet implemented on FPGA, the parallelism of convolution unit is low and the storage structure is almost dependent on off-chip memory. A FPGA accelerator optimization is proposed based on ZynqNet and it is easy to apply in other CNN models. The double buffering stores intermediate result of the network into the chip to reduce off-chip access; The data precision is changed from 32 bits to 16 bits, thus a parallel structure of 64 convolution operation units is designed to improve computing parallelism. The ImageNet results show that the optimized accelerator based on FPGA can achieve peak performance of 1.85 GMAC/s under 200 MHz, it is 10 times speedup compared to the original ZynqNet and 20 times speedup compared to i5-5200U CPU. In terms of performance power ratio, the FPGA accelerator is 5.4 times of NVIDIA GTX 970GPU version.
  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return