刘偲旸, 蒋剑飞, 毛志刚. 一种指令集控制的神经网络加速器设计[J]. 微电子学与计算机, 2021, 38(5): 1-6.
引用本文: 刘偲旸, 蒋剑飞, 毛志刚. 一种指令集控制的神经网络加速器设计[J]. 微电子学与计算机, 2021, 38(5): 1-6.
LIU Si-yang, JIANG Jian-fei, MAO Zhi-gang. A programmable accelerator for convolution neural network[J]. Microelectronics & Computer, 2021, 38(5): 1-6.
Citation: LIU Si-yang, JIANG Jian-fei, MAO Zhi-gang. A programmable accelerator for convolution neural network[J]. Microelectronics & Computer, 2021, 38(5): 1-6.

一种指令集控制的神经网络加速器设计

A programmable accelerator for convolution neural network

  • 摘要: 随着深度学习和神经网络技术的发展,为了充分挖掘卷积神经网络(CNN)计算的并行性,硬件加速器以其高速、低成本、高容错能力等特点得到更加广泛的应用.本文提出了一种可以逐层优化CNN网络的新算法,设计了对应的指令集.所提出的算法可用于为具有特定计算资源和存储资源的不同网络找到最佳加速方案.在优化过程中,可以将不同类型的数据量化为半精度以减少内存访问.基于40 nm CMOS工艺和提出的算法,完成了一种指令集控制的神经网络加速器设计.该加速器在200 MHz的工作频率下,峰值性能可达到416 GOP/s.在设计的加速器上实现了VGG16网络的推理过程,整个网络的延迟仅为116毫秒.

     

    Abstract: In order to fully explore the parallelism of convolutional neural network (CNN) computing, hardware accelerators are more attractive for their characteristics of high speed, low cost and high fault tolerance. A novel algorithm that can optimize the CNN network layer by layeris proposed, and the corresponding instruction set is designedinthis paper. The proposed algorithm can be used to find an optimal acceleration scheme for differ-ent networks with specific computing and storage resources. In the optimization process, different types of data can be quantized to half-precision to reduce memory access. Based on the 40 nm CMOS process and the proposed algorithm, aprogrammable accelerator for CNN is designed, which can achieve peak performance of 416 GOP/s under 200 MHz working frequency. VGG is implemented on our accelerator as a case study, and the latency of the total network is 116 ms.

     

/

返回文章
返回