基于分时重用行卷积查找表的BNN加速器

A BNN accelerator based on time-sharing reuse row convolution LUT

摘要: 二值化神经网络(Binary Neural Network, BNN)具有单比特数据位宽的特点，可以很好地解决传统卷积神经网络中存在大量数据量以及计算量的问题.为了进一步加速BNN的正向推导并降低所需功耗，提出一种基于FPGA的完全二值化卷积神经网络加速器，其中输入图片以及边缘填充都进行了二值化处理，并且通过分时重用行卷积查找表的方式跳过其中的冗余计算.在Xilinx的ZCU102 FPGA开发板上对所设计的加速器进行评估，结果表明加速器的运算速度可以达到3.1 TOP/s，并且可以达到144.2 GOPS/KLUT的资源效率转换比以及3 507.8 GOPS/W的能效转换比.

Abstract: The single-bit data width characteristic of Binary Neural Network (BNN) can tackle large-scale-data and huge-amount-calculation in Convolution Neural Network (CNN). In order to further accelerate the forward inference of BNN and reduce the required power consumption, a fully binarized neural network accelerator based on FPGA is proposed, in which the input image and edge padding are all binarized. And the accelerator skips the redundant calculations by reusing the Row Convolution LUT (RC-LUT) in a time-sharing way. By implementing on Xilinx's ZCU102 FPGA, this accelerator can achieve a Performance of more than 3.1 TOP/s, an Area Efficiency of 144.2 GOPS/KLUT, and a Power Efficiency of 3 507.8 GOPS/W.