常立博, 王时雨. 基于轻核阵列机的FFT算法并行化研究与实现[J]. 微电子学与计算机, 2018, 35(1): 100-105.
引用本文: 常立博, 王时雨. 基于轻核阵列机的FFT算法并行化研究与实现[J]. 微电子学与计算机, 2018, 35(1): 100-105.
CHANG Li-bo, WANG Shi-yu. Research and Implementation of Parallel FFT Algorithm Based on Light Core Array[J]. Microelectronics & Computer, 2018, 35(1): 100-105.
Citation: CHANG Li-bo, WANG Shi-yu. Research and Implementation of Parallel FFT Algorithm Based on Light Core Array[J]. Microelectronics & Computer, 2018, 35(1): 100-105.

基于轻核阵列机的FFT算法并行化研究与实现

Research and Implementation of Parallel FFT Algorithm Based on Light Core Array

  • 摘要: 目前普遍采用基于流水的单路径延时反馈结构, 或基于存储结构实现快速傅里叶变换(Fast Fourier Transform, FFT).前一种结构效率高但缺乏灵活性, 而后一种结构通用性较好但性能较差.首先提出了一种FFT并行实现的算法, 然后在并行图形阵列机(Parallel Array Architecture for Graphics, PAAG) 平台上实现了基2时间抽取的FFT (Decimation-In-Timer FFT, DIT-FFT) 算法; 最后将长度为512的DIT-FFT算法分别映射到1个PE、4个PE、8个PE和16个PE上实现.分析实验结果显示: 随着PE个数的增加, 加速比呈曲线上升的趋势; 但随着PE个数的不断增加, 算法执行速度增长开始变得缓慢, 当映射到8个PE上时, 最大加速比可以达到5.98.

     

    Abstract: At present, the fast Fourier transform is realized based on the flow-based single-path delay feedback or storage structure, the former structure is highly efficient but lacks flexibility, and the latter is more versatile but less robust.In this paper, firstly, an algorithm is implemented in parallel with FFT, and then the Decitation-InTimer FFT (DIT-FFT) algorithm is implemented on the Parallel Array Architecture for Graphics (PAAG) platform.Finally, the DIT-FFT algorithm with length of 512 is mapped to 1 PE, 4 PE, 8 PEs and 16 PEs respectively.The results show that with the increase of the number of PEs, the acceleration rate increases with the increase of the number of PEs.However, as the number of PEs increases, the speed of the implementation of the algorithm becomes slow, and when it is mapped to 8 PEs, The acceleration ratio can reach 5.98.

     

/

返回文章
返回