容源, 江先阳. 一种基于多层Karatsuba算法的高效全字模乘器设计[J]. 微电子学与计算机, 2022, 39(10): 97-102. DOI: 10.19304/J.ISSN1000-7180.2022.0240
引用本文: 容源, 江先阳. 一种基于多层Karatsuba算法的高效全字模乘器设计[J]. 微电子学与计算机, 2022, 39(10): 97-102. DOI: 10.19304/J.ISSN1000-7180.2022.0240
RONG Yuan, JIANG Xianyang. One high-efficiency full-word modular multiplier based on multi-layer Karatsuba algorithm[J]. Microelectronics & Computer, 2022, 39(10): 97-102. DOI: 10.19304/J.ISSN1000-7180.2022.0240
Citation: RONG Yuan, JIANG Xianyang. One high-efficiency full-word modular multiplier based on multi-layer Karatsuba algorithm[J]. Microelectronics & Computer, 2022, 39(10): 97-102. DOI: 10.19304/J.ISSN1000-7180.2022.0240

一种基于多层Karatsuba算法的高效全字模乘器设计

One high-efficiency full-word modular multiplier based on multi-layer Karatsuba algorithm

  • 摘要: 模乘作为许多密码系统的核心算法,是典型的计算密集型任务,往往是加密系统的性能瓶颈.为此,人们提出了各种面向模乘的专用加速电路.为了进一步提高电路性能,基于大数乘法的多层Karatsuba算法原理提出了一种全字Montgomery模乘器结构,有效提高了高基算法中大数运算的效率.提出的多层Karatsuba乘法器结构有效降低了乘法运算粒度,在连续执行大数乘法时使硬件利用率达到最高,同时利用按数据位宽分段运算的方法有效提高了电路的工作频率.基于Virtex7 FPGA器件的综合结果显示,电路时钟频率达到250 MHz,33个周期完成了256位Montgomery模乘运算,延时132 ns.依据我们所知,全字模乘器的综合性能要优于当前最好的工作.提出的设计方法对于如何利用多层Karatsuba算法减小硬件乘法器的面积和关键路径长度提供了切实可行的参考.

     

    Abstract: As the core algorithm of many cryptosystems, modular multiplication typically is a computation-intensive task and often the bottleneck of the system. To attack this problem, various modular multiplication oriented specific accelerator have been proposed. In order to further improve the circuit performance, a full-word Montgomery modular multiplier based on multi-layer Karatsuba algorithm theory for large number multiplication is proposed, which effectively increases the high radix computing efficiency for large number calculation. The proposed structure of multi-layer Karatsuba multiplier effectively reduces the granularity of multiplication operation and achieves the highest hardware utilization efficiency to continuously carry out large number multiplication, as well as effectively improves the running circuit frequency by utilizing piecewise calculation approach according to the data bit-width. The synthesis results based on Virtex 7 FPGA demonstrates that the clock frequency of the proposed circuit reaches 250 MHz, and 256 bit Montgomery multiplication is carried out in 33 cycles with 132 ns latency. To best of our knowledge, the proposed full-word Montgomery modular multiplier outperforms the state of art designs. The proposed design provides a generally practical reference for how to reduce both area and critical path latency of hardware multiplier through multi-layer Karatsuba algorithm.

     

/

返回文章
返回