潘达杉,黄金明,马超.一种基于改进K-means算法的高能效时钟网络设计[J]. 微电子学与计算机,2023,40(8):101-107. doi: 10.19304/J.ISSN1000-7180.2022.0623
引用本文: 潘达杉,黄金明,马超.一种基于改进K-means算法的高能效时钟网络设计[J]. 微电子学与计算机,2023,40(8):101-107. doi: 10.19304/J.ISSN1000-7180.2022.0623
PAN D S,HUANG J M,MA C. A highly energy efficient local clock network design based on improved K-means algorithm[J]. Microelectronics & Computer,2023,40(8):101-107. doi: 10.19304/J.ISSN1000-7180.2022.0623
Citation: PAN D S,HUANG J M,MA C. A highly energy efficient local clock network design based on improved K-means algorithm[J]. Microelectronics & Computer,2023,40(8):101-107. doi: 10.19304/J.ISSN1000-7180.2022.0623

一种基于改进K-means算法的高能效时钟网络设计

A highly energy efficient local clock network design based on improved K-means algorithm

  • 摘要: 本文针对先进处理器中部件级时钟网络设计面临的时钟网络偏斜难控制、时钟负载重动态功耗大的问题,实现了一种高能效局部时钟网络设计方法,提出了基于考虑负载K-means算法的时钟驱动点位置优化算法TKDLO(Timing driven K-means based Driver Location Optimization),在不影响时序的前提下,实现了局部门控时钟驱动单元的位置优化,降低了时钟网络的偏斜. 通过采用不同触发器规模的设计验证,模块级时钟长度可以优化15%以上,时钟偏斜优化30%以上. 以访存执行部件的时钟设计为例,本文所提出的局部时钟设计方法,相比于传统CTS的实现方式,在时钟延迟和偏斜方面实现了超过50%的优化,整个设计等效频率提升14%、平均功耗优化28%、最终模块能效提升58.7%;相比于基于触发器聚类的fishbone时钟结构,在15.2%的时钟延迟恶化和5%功耗恶化代价下,使模块的频率提升7.6%,能效优化14.2%. .

     

    Abstract: Aiming at the problems of wide load distribution, long clock insertion delay and difficult to control skew faced by the design of block level clock network in advanced processor, this paper proposes an energy-efficient local clock network design method. A clock driving position optimization method based on K-means algorithm considering load called TKDLO algorithm is proposed and implemented compiled with current EDA flow. Test results show that the average improvement for module level clock insertion delay is about 15% along with 30% skew reduction. Notably, taking the clock network design of memory access execution block as an example, compared with the implementation of traditional CTS, the local clock design method proposed in this paper achieves more than 50% optimization in clock insertion delay and skew. The equivalent frequency of the whole design is improved by 14%, the average power consumption is optimized by 28%, and the energy efficiency of the block is improved by 81.2%. Compared with the fishbone clock structure based on sequential clustering, the local clock design method proposed in this paper can increase the frequency of the module by 7.6% and optimize the energy efficiency by 14.2% at the cost of 15.2% clock insertion delay and 5% power.

     

/

返回文章
返回