FPGA Parallel Structure Design of Convolutional Neural Network (CNN) Algorithm
-
Abstract
In this paper, the FPGA parallel structure design of CNN algorithm is carried out. The design first uses the parallel computing features of CNN and the cyclic transformation method to realize a convolution calculation circuit that can efficiently perform parallel pipelines. Then, using the double-buffer technology that can reduce the memory access time, a cache array is implemented in the input and output sections to improve the computational performance of the circuit (GOPS, one billion operations per second). At the same time, the activation function is optimized. The hardware circuit of the activation function (sigmoid) is designed by using the segmentation fitting method of lookup table and polynomial to ensure that the hardware circuit of the approximate activation function will not reduce the accuracy. The experimental results show that when the input clock is 150MHz, the overall performance of the circuit is improved from 15.87 GOPS to 20.62 GOPS, and the recognition rate on the MNIST data set reaches 98.81%.
-
-