DSP体系结构发展综述

宋文娜; 徐东君; 陈亮

doi:10.19304/J.ISSN1000-7180.2022.0456

摘要: 数字信号处理器（Digital Signal Processor, DSP）是一种用于数字信号处理的专用微处理器,在通信、自动化、雷达、航空航天等领域具有重要应用价值. 本文系统阐述了DSP体系结构的发展过程和现状,介绍了主要生产厂商的DSP产品及其性能；总结了DSP芯片的主要结构特点；分析了现有DSP体系结构设计中提升数据级和指令级并行性的主要技术,包括哈佛结构、硬件乘法器、SIMD、VLIW和超标量等. 结合新时代DSP应用需求,本文提出了DSP体系结构研究的三个发展方向：(1)通过增加数据和指令并行性,向超高性能DSP发展,提升矢量、标量并行能力,支持张量计算,集成面向神经网络算子的专用控制通路和功能单元,提升AI计算处理能力；(2)从指令系统入手,将变长指令集与超标量技术结合,在实现指令并行的同时,结合可适应神经网络算法扩展的计算流控制指令,提升AI算法映射能力,同时降低代码密度,减小存储压力和取指带宽,降低成本,提升边缘智能实时处理应用能力；(3)兼容面向稀疏神经网络的压缩和并发访问的分布式存储结构,提升边缘智能片上部署能力和网络层多通道并行计算能力.

Abstract: Digital signal processor (DSP) is a special microprocessor for digital signal processing, which has important application value in communication, automation, radar, aerospace and other fields. This paper systematically expounds the development process and current situation of DSP architecture, and introduces the DSP products and performance of the main manufacturers; Moreover, the main structure characteristics of DSP chip are summarized; This paper also analyzes the main techniques for improving data level and instruction level parallelism in the existing DSP architecture design, including Harvard architecture, hardware multiplier, SIMD, VLIW and superscalar. Combined with the application requirements of DSP in the new era, this paper proposes three development directions of DSP architecture research: (1) Increasing the parallelism of data and instructions could move DSP toward ultra-high performance. Improving the vector and scalar parallel ability, supporting tensor calculation, integrating special control channels and functional units for neural network operators can promote the AI computing processing ability. (2) Starting from the instruction system, combining a variable-length instruction set with superscalar technology to realize instruction parallelism, and at the same time, the computational flow control instruction that can adapt to the expansion of neural network algorithm is combined to improve the mapping ability of AI algorithm, and meanwhile reducing the code density, the storage pressure and the fetch bandwidth, minimizing the cost, and improving the edge intelligent real-time processing application ability; (3) The compatible distributed storage structure of compression and concurrent access for sparse neural networks can enhance the edge intelligent on-chip deployment capability and the network layer multi-channel parallel computing capability.

DSP体系结构发展综述

Overview of DSP architecture development