Design and implementation of high performance FFT processor based on conflict-free access rules
-
摘要:
高性能快速傅里叶变换(FFT)处理器在雷达与通信等实时信号处理系统中具有广泛的应用场景.本文通过优化无冲突访存规则,结合基-2和基-8时域抽取的FFT算法,设计了一种高性能的混合基FFT处理器.该处理器采用基于存储的架构,主要组成单元包括蝶形运算单元,存储单元和控制单元.通过优化无冲突访存规则以及旋转因子生成方案,提高计算速度和硬件效率。在SMIC 40nm标准CMOS工艺下,FFT处理器工作主频超过500 MHz,核心面积为1.76×0.85 mm2,且计算结果信噪比超过136 dB.在32K点FFT计算任务下,计算速度相比同类型FFT处理器提高约4倍.
-
关键词:
- 快速傅里叶变换 /
- 基-8无冲突访存规则 /
- 混合基 /
- ASIC
Abstract:The high-performance Fast Fourier Transform (FFT) processor has a wide range of application scenarios in real-time signal processingsystems such as radar and communication. In this paper, a high performance FFT processor is designed by optimizing the conflict-free access rules and combining the radix-2 and radix-8 DIT-FFT. The processor employs a Memory-Based architecture, including a butterfly operator unit, a memory unit, and a control unit. The calculation speed and hardware efficiency are improvedImprove calculation speed and hardware efficiency by optimizing Conflict-Free access rules and twiddle factor generation schemes. The proposed processor is implemented on SMIC 40 nm CMOS technology. Simulation results show that the processor can work at over 500 MHz, the core area is1.76×0.85mm2, and the SNR is over136dB or more. Under the 32K-point FFT calculation task, the calculation speed is about 4 times higher than that of the same type of FFT processor.
-
Key words:
- FFT /
- Radix-8 conflict-free access rules /
- mixed radix /
- ASIC
-
表 1 旋转因子误差分析
SNR 平均误差 最大误差 147.210 7 1.714 3e-08 1.192 1e-07 表 2 误差分析
序列长度 SNR 平均误差 最大误差 1 k 实部 138.13 1.0110e-5 4.5776e-5 虚部 137.94 1.0750e-5 45776e-5 2 k 实部 137.35 1.5911e-5 6.8665e-5 虚部 137.27 1.6534e-5 1.2207e-4 8 k 实部 136.27 3.7476e-5 2.4414e-4 虚部 136.56 3.7565e-5 2.4414e-4 16 k 实部 136.09 5.6936e-5 3.6621e-4 虚部 136.00 5.6542e-5 4.8828e-4 32k 实部 136.25 7.5934e-5 4.8828e-4 虚部 136.33 7.6528e-5 9.7656e-4 -
[1] COOLEY J W, TUKEY J W. An algorithm for the machine calculation of complex Fourier series[J]. Mathematics of Computation, 1965, 19(90): 297-301. DOI: 10.1090/S0025-5718-1965-0178586-1. [2] HASAN M, ARSLAN T, THOMPSON J S. A novel coefficient ordering based low power pipelined radix-4 FFT processor for wireless LAN applications[J]. IEEE Transactions on Consumer Electronics, 2003, 49(1): 128-134. DOI: 10.1109/TCE.2003.1205465. [3] 宋玮, 李如玮, 代栋敏.基于FPGA的基8-FFT处理器设计[J].科技导报, 2010, 28(16): 67-70. https://www.cnki.com.cn/Article/CJFDTOTAL-KJDB201016025.htmSONG W, LI R W, DAI D M. Radix-8 FFT processor design based on FPGA[J]. Science & Technology Review, 2010, 28(16): 67-70. https://www.cnki.com.cn/Article/CJFDTOTAL-KJDB201016025.htm [4] WANG Z K, LIU X, HE B S, et al. A combined SDC-SDF architecture for normal I/O pipelined radix-2 FFT[J]. IEEE Transactions on Very Large Scale Integration (VLSI)Systems, 2015, 23(5): 973-977. DOI: 10.1109/TVLSI.2014.2319335. [5] LIU S H, LIU D K. A high-flexible low-latency memory-based FFT processor for 4G, WLAN, and future 5G[J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2019, 27(3): 511-523. DOI: 10.1109/TVLSI.2018.2879675. [6] XIA K F, WU B, XIONG T, et al. A memory-based FFT processor design with generalized efficient conflict-free address schemes[J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2017, 25(6): 1919-1929. DOI: 10.1109/TVLSI.2017.2666820. [7] SUN M X, TIAN L Y, DAI D M. Radix-8 FFT processor design based on FPGA[C]//Proceedings of the 2012 5th International Congress on Image and Signal Processing. Chongqing: IEEE, 2012: 1453-1457. DOI: 10.1109/CISP.2012.6469786. [8] JIANG R M. An area-efficient FFT architecture for OFDM digital video broadcasting[J]. IEEE Transactions on Consumer Electronics, 2007, 53(4): 1322-1326. DOI: 10.1109/TCE.2007.4429219. [9] 王江, 黑勇, 郑晓燕, 等.基于无冲突地址生成的高性能FFT处理器设计[J].微电子学与计算机, 2007, 24(3): 15-19.DOI: 10.3969/j.issn.1000-7180.2007.03.004.WANG J, HEI Y, ZHENG X Y, et al. Design of high performance FFT processor with conflict free memory access[J]. Microelectronics & Computer, 2007, 24(3): 15-19. DOI: 10.3969/j.issn.1000-7180.2007.03.004. [10] YU J Y, HUANG D, LI X, etal. Conflict-free architecture for multi-butterfly parallel processing in-place Radix-r FFT[C]//Proceedings of the 2016 IEEE 13th International Conference on Signal Processing (ICSP). Chengdu: IEEE, 2016. DOI: 10.1109/ICSP.2016.7877884. [11] 蔚接锁.基于FPGA与流水线CORDIC算法的FFT处理器的实现[D].天津: 天津大学, 2009.WEI J S. The implementation of a FFT processorbased on FPGA and CORDIC algorithm[D]. Tianjin: Tianjin University, 2009. [12] ZHANGD L, HUANG L, SONG Y K, et al. Design and implementation of 1-D and 2-D mixed architecture FFT processor in heterogeneous multi-core SoC based on FPGA[J].International Journal of Control and Automation, 2014, 7(6): 177-188. DOI: 10.14257/ijca.2014.7.6.18. [13] STEVENSON D. 754-1985-IEEE standard for binary floating-point arithmetic[S].New York: IEEE, 1985. DOI: 10.1109/IEEESTD.1985.82928. [14] HAN F, LI L, WANG K, et al. An ultra-long FFT architecture implemented in a reconfigurable application specified processor[J]. IEICE Electronics Express, 2016, 13(13): 20160504. DOI: 10.1587/elex.13.20160504. [15] 周益超.数字广播通信系统中FFT的算法仿真与FPGA实现[D].南京: 东南大学, 2016.ZHOU Y C. A simulation of FFT algorithm and FPGA implementation in digital broadcasting communication system[D]. Nanjing: Southeast University, 2016. -