王彬燏,杨志家,谢闯,等.面向CNN卷积层硬件的计算资源优化设计[J]. 微电子学与计算机,2024,41(7):89-95. doi: 10.19304/J.ISSN1000-7180.2023.0436
引用本文: 王彬燏,杨志家,谢闯,等.面向CNN卷积层硬件的计算资源优化设计[J]. 微电子学与计算机,2024,41(7):89-95. doi: 10.19304/J.ISSN1000-7180.2023.0436
WANG B Y,YANG Z J,XIE C,et al. Optimal design of computing resources for CNN convolution layer hardware[J]. Microelectronics & Computer,2024,41(7):89-95. doi: 10.19304/J.ISSN1000-7180.2023.0436
Citation: WANG B Y,YANG Z J,XIE C,et al. Optimal design of computing resources for CNN convolution layer hardware[J]. Microelectronics & Computer,2024,41(7):89-95. doi: 10.19304/J.ISSN1000-7180.2023.0436

面向CNN卷积层硬件的计算资源优化设计

Optimal design of computing resources for CNN convolution layer hardware

  • 摘要: 传统卷积神经网络(Convolutional Neural Network, CNN)专用加速器在实现卷积层算子重构、数据复用和计算资源复用时,会产生硬件资源利用率较低的问题。对此设计了一种基于动态寄存器堆和可重构PE阵列相结合的硬件架构,通过优化数据流使得各PE单元负载均衡,进而提高卷积层计算资源的利用率。可灵活部署0 ~ 11大小和1 ~ 10步长的奇数卷积核,支持多通道并行卷积、输入数据复用等操作。设计使用verilog硬件描述语言实现,通过创建UVM环境进行功能性验证。实验表明:在加速AlexNet模型的卷积层时,峰值算力的吞吐率相比于相关研究提高了9.5% ~ 64.3%,在映射5种经典神经网络里不同尺寸大小和步长的卷积核时,PE单元的平均利用率相比于相关研究提高了4% ~ 11%。

     

    Abstract: The traditional Convolutional Neural Network(CNN) dedicated accelerator will produce the low hardware resource utilization problem when realizing the convolution layer operator reconstruction, data multiplexing and computational resource reuse. A hardware architecture based on the combination of dynamic Register file and reconfigurable PE array is designed to balance the load of each PE unit by optimizing the data stream, thus improving the utilization of computing resources in the convolution layer. It can flexibly deploy odd convolution kernel with 0 to 11 size and 1 to 10 step length, and support multi-channel parallel convolution and input data multiplexing operations. The design is implemented using verilog hardware description language, and functional verification is carried out by creating UVM environment. The experiments show that when accelerating the convolutional layer of the AlexNet model, the throughput of peak computing power is increased by 9.5% to 64.3% compared with relevant studies. When mapping convolutional kernels of different sizes and steps in five classical neural networks, the average utilization rate of PE units is increased by 4% to 11% compared with relevant studies.

     

/

返回文章
返回