李岑, 贺光辉. 用于实时目标检测的FPGA神经网络加速器设计[J]. 微电子学与计算机, 2020, 37(7): 6-11.
引用本文: 李岑, 贺光辉. 用于实时目标检测的FPGA神经网络加速器设计[J]. 微电子学与计算机, 2020, 37(7): 6-11.
LI Cen, HE Guang-hui. Design of FPGA-based neural network accelerator for real-time objective detection[J]. Microelectronics & Computer, 2020, 37(7): 6-11.
Citation: LI Cen, HE Guang-hui. Design of FPGA-based neural network accelerator for real-time objective detection[J]. Microelectronics & Computer, 2020, 37(7): 6-11.

用于实时目标检测的FPGA神经网络加速器设计

Design of FPGA-based neural network accelerator for real-time objective detection

  • 摘要: 在FPGA上实现YOLO等目标检测算法,需要从模型量化到硬件优化等多种优化方法.为了缩短硬件延时,使用了三种技术:(1)利用层融合和位宽量化策略来降低计算复杂度;(2)利用具有padding跳过技术的基于列的流水线架构来减少启动时间;(3)利用设计空间探索算法来平衡流水线时间,提高DSP使用效率.为了验证提出的神经网络加速器架构,在ZC706 FPGA上实现了具有1 280×384输入的YOLO网络.与传统加速器相比,取得了1.97倍的延迟缩减或者1.54倍的DSP效率提升.

     

    Abstract: Implementing object detection algorithms, such as YOLO, in FPGA requires multi-level optimization, starting from model quantization to hardware optimization. To optimize hardware latency, three techniques are used: (1) bit-width quantization and layer fusion strategies are used to minimize the computation complexity, (2) a column-based pipeline architecture with padding skip technique is introduced to reduce the start-up time of pipeline and (3) a design space exploration algorithm is proposed to balance the pipeline and improve the DSP efficiency. To demonstrate the proposed neural network accelerator architecture, YOLO with 1 280×384 input is implemented on ZC706 FPGA and achieves a 1.97× latency reduction or a 1.54× DSP efficiency improvement over traditional accelerators.

     

/

返回文章
返回