JSSC 2007第10期Memory0.18μm

An Energy-Efﬁcient Mobile Vertex Processor With Multithread Expanded VLIW Architecture and

提出一种能效优化的移动端顶点处理器，采用多线程扩展VLIW架构和顶点缓存技术

0.18μm CMOS, 100MHz, 120Mvertices/s, 44.7%能耗降低

顶点处理器VLIW架构多线程能效优化OpenGL ES 2.0

▸创新点1：多线程数据路径消除数据冒险（系统创新）。通过引入四线程并行处理机制，有效避免了传统单线程架构中的数据依赖和资源冲突问题，提升了指令执行效率，支持高达120 Mvertices/s的几何处理性能。

▸创新点2：四线程四发射扩展VLIW并行架构（架构创新）。采用扩展的超长指令字（VLIW）设计，结合多线程调度，实现了指令级并行（ILP）和线程级并行（TLP）的协同优化，使处理器在100 MHz频率下性能提升3.3倍。

▸创新点3：专用顶点缓存设计（电路创新）。通过优化顶点缓存结构，减少了主机与顶点处理器间的数据带宽需求，降低了44.7%的平均总能耗，同时加速了几何运算的本地化处理。

▸创新点4：支持OpenGL ES 2.0和Vertex Shader Model 3.0（兼容性创新）。通过硬件架构的标准化设计，确保了对主流图形API的兼容性，扩展了移动多媒体应用场景的适用性。

Abstract

In this paper, a 3-D vertex processor with a ﬂoating- point four-threaded and four-issue expanded VLIW architecture and vertex caches for mobile multimedia applications is proposed. The multi-threaded datapath prevents data hazards, and the multi- issue expanded VLIW architecture enables the processor to have an opportunity to execute instructions in parallel and a well-bal- anced way. The efﬁcient vertex caches are proposed and imple- mented for the embedded vertex processors to accelerate its geom- etry operations and to save bandwidth between hosts and vertex processors. The proposed architecture with the vertex caches re- duces the average total energy dissipation of 44.7% compared to a conventional single-threaded SIMD architecture, and the pro- posed vertex processor achieves 120 Mvertices/s of geometry per- formance which is 3.3 times faster than the previous result, and it supports OpenGL ES 2.0 and Vertex Shader Model 3.0. The pro- cessor is implemented in a 0.18- m 1P4M CMOS process, and the operating frequency is 100 MHz.