分簇处理器中分簇投机的L0 Cache设计

Speculative Clustered L0 Caches Design in Clustered Processor

摘要: 处理器分簇技术是进一步提高超标量处理器性能的一种有效手段,实现了更大指令窗口和发射宽度的同时对Cache系统提出了严峻要求,需要一种访存延迟更小、扩展性更强的Cache结构.采用分簇投机的L0 Cache结构,处理器在访存时投机访问各簇内简单快速的L0 Cache,较好地隐藏了下级Cache的访问延迟.仿真结果显示在8簇的分簇处理器中,采用4kB,2路组相连的分簇L0 Cache后处理器性能平均提升5.6%,在部分测试程序中达到20%以上.

Abstract: Clustering is an attractive technique for large monolithic superscalar processor, allowing for more in-flight instructions, wider issue width. Thus, to design a Cache structure with low memory access latency and high scalability is needed. By implementing spectulative clustered L0 caches, clustered processor speculatively accesses a small, fast, and simple L0 cache in cluster so that accessing latency of low-level high-capacity cache is hidden. As a result, the latency of memory access is shortened. Simulation studies show that 4kB, 2-way set associative L0 Cache in 1x8 clustered processor provides a 5.6% IPC improvement, and in some particular programs a 20% improvement is achieved.