GAN Zhiying, XU Dawen. Research on OCR model compression scheme for AIoT chips[J]. Microelectronics & Computer, 2022, 39(11): 110-117. DOI: 10.19304/J.ISSN1000-7180.2022.0241
Citation: GAN Zhiying, XU Dawen. Research on OCR model compression scheme for AIoT chips[J]. Microelectronics & Computer, 2022, 39(11): 110-117. DOI: 10.19304/J.ISSN1000-7180.2022.0241

Research on OCR model compression scheme for AIoT chips

  • Deep learning-based OCR models usually consist of CNN and RNN/LSTM, which are computationally intensive and have many weight parameters, resulting in a large amount of computational resources required to achieve the performance requirements for inference in edge devices. general-purpose processors such as CPU and GPU cannot meet both processing speed and power requirements, and are very costly. With the popularity of deep learning, neural processing units NPUs are becoming common in many embedded and edge devices with high throughput computational power to handle the matrix operations involved in neural networks. An OCR model based on CRNN, for example, gives a solution for AIoT chips that reduces the redundancy of network parameters through two compression algorithms, pruning and quantization, to reduce the computational overhead but still obtain a compression model with high accuracy and robustness, enabling the model to be deployed on NPUs. Experimental results show that parameter quantization of the pruned and fine-tuned model reduces the accuracy of the quantized model by no more than 3% with a sparsity of 78% and compresses the model size from 15.87MB to 3.13MB. Deploying the compressed model to the NPU side, the NPU achieves a 28.87x and 6.1x speedup in latency compared to the implementations on the CPU and GPU, respectively.
  • loading

Catalog

    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return