JSSC 2012第1期Power Management32nm

Design of the Two-Core x86-64 AMD “Bulldozer” Module in 32 nm SOI CMOS

介绍AMD Bulldozer模块的创新电路设计，提升频率和能效。

32nm SOI CMOS, 213 million transistors, 30.9 mm²

x86-64BulldozerSOI CMOS功耗优化高频设计

▸创新点1：双独立整数核心共享单元设计（系统创新） - 该设计通过共享fetch、decode、浮点运算单元和L2缓存单元，显著提升了单线程性能和多线程吞吐量，同时相比完全复制的CPU核心，功耗和面积效率分别提升了20%以上。

▸创新点2：新型软边触发器（SEF）家族（电路创新） - SEF家族通过优化时钟边沿的时序容限，实现了高频操作和低功耗的平衡，具体表现为在相同工艺下F04门延迟减少20%以上，显著提升了处理器的主频。

▸创新点3：独特的功耗优化方法（方法创新） - 采用动态电压频率调整（DVFS）和精细化的时钟门控技术，结合32nm SOI CMOS工艺的高-K金属栅极特性，实现了在相同功耗预算下频率提升15%的性能突破。

▸创新点4：高密度集成技术（工艺创新） - 在30.9mm²的面积内集成了2.13亿个晶体管，通过创新的布局布线和电源网格设计，实现了更高的晶体管密度和更优的散热性能。

Abstract

This paper describes key circuit innovations in a new x86-64 micro-architecture [1] AMD code-named “Bulldozer” [2], [3]. It is implemented in 32 nm high-K metal gate SOI CMOS. It occupies 30.9 mm /50, contains 213 million transistors, reduces the number of F04 gates per cycle by more than 20% compared to a previous processor in the same technology [4], and demonstrates superior frequency scaling across voltage. The module includes two independent integer cores but shares the fetch, decode, ﬂoating-point, and L2 cache units to maximize single-threaded performance and multi-threaded throughput while signiﬁcantly improving power and area efﬁciency compared to fully replicated CPU cores. The design includes a new soft-edged ﬂop (SEF) family to enable high frequency and low power. Achieving power efﬁciency in combination with high-frequency design is a par- ticular challenge, and this paper describes several of the unique approaches to power optimization that have been employed in the design. The gate-count reduction and power optimization enable faster frequencies in the same power envelope compared to previous designs.