Abstract:
Clustering is an attractive technique for large monolithic superscalar processor, allowing for more in-flight instructions, wider issue width. Thus, to design a Cache structure with low memory access latency and high scalability is needed. By implementing spectulative clustered L0 caches, clustered processor speculatively accesses a small, fast, and simple L0 cache in cluster so that accessing latency of low-level high-capacity cache is hidden. As a result, the latency of memory access is shortened. Simulation studies show that 4kB, 2-way set associative L0 Cache in 1x8 clustered processor provides a 5.6% IPC improvement, and in some particular programs a 20% improvement is achieved.