贺文, 祝永新, 汪辉, 黄尊恺. 面向自动搜索型神经网络的加速器设计与实现[J]. 微电子学与计算机, 2021, 38(11): 88-94. DOI: 10.19304/J.ISSN1000-7180.2021.0279
引用本文: 贺文, 祝永新, 汪辉, 黄尊恺. 面向自动搜索型神经网络的加速器设计与实现[J]. 微电子学与计算机, 2021, 38(11): 88-94. DOI: 10.19304/J.ISSN1000-7180.2021.0279
HE Wen, ZHU Yongxin, WANG Hui, HUANG Zunkai. Accelerator design and implementation for automatic searching neural network[J]. Microelectronics & Computer, 2021, 38(11): 88-94. DOI: 10.19304/J.ISSN1000-7180.2021.0279
Citation: HE Wen, ZHU Yongxin, WANG Hui, HUANG Zunkai. Accelerator design and implementation for automatic searching neural network[J]. Microelectronics & Computer, 2021, 38(11): 88-94. DOI: 10.19304/J.ISSN1000-7180.2021.0279

面向自动搜索型神经网络的加速器设计与实现

Accelerator design and implementation for automatic searching neural network

  • 摘要: 近年来,通过神经网络架构搜索(NAS)得到的自动搜索型神经网络在视觉任务中表现尤为突出,然而,其更为复杂多变的卷积规模和运算方式限制了其在边缘侧设备的应用.为解决这一问题,针对自动搜索型神经网络搜索空间中各种复杂多变的计算方式,设计了一款可加速自动搜索型神经网络的高帧率高灵活度加速器.首先,针对其丰富的卷积类型提出了阵列复用混合卷积(AMMC)结构,不新增MAC阵列就能灵活地实现不同卷积在不同方向上的并行化处理.其次,提出了一种可变精度的可配置多路选择激活(CMA)结构,有效地实现了这类网络应用的多种激活函数的高精度拟合.将MAC阵列规模为32*32的加速器部署到Xilinx的zcu102芯片上,时钟频率可达200 MHz,加速器功耗为3.2 w;移植MnasNet-a1对224×224尺寸图片处理的实际运行帧率为272.9 fps.

     

    Abstract: In recent years, the Automatic Searching Neural Networks obtained through Neural Architecture Search (NAS) has performed quite prominently in visual tasks, but their complex and variable convolution scale and convolution types limit their application in edge-side devices. To solve this problem, a high flexibility and high frame rate accelerator is proposed to accelerate automatic searching neural networks represented by MnasNet. Firstly, the Array Multiplexing Mixed Convolution(AMMC) structure is proposed for its rich convolution types, which can realize the parallel processing of different convolutions in different directions without using additional MAC resources. Secondly, a variable precision Configurable Multiple Selection Activation(CMA) structure is proposed, which can effectively realize the high-precision fitting of various activation functions. When the accelerator is deployed on the zcu102 chip of Xilinx with a 32*32 MAC scale, the clock frequency can reach 200 MHz, the power consumption of the accelerator is 3.2 w, and the actual operating frame rate for 224×224 size image of MnasNet-a1 is 272.9 fps.

     

/

返回文章
返回