← 返回 JSSC 论文列表
📄 下载 JSSC 原文 PDF
JSSC 2007第10期Memory0.18μm

An Energy-Efficient Mobile Vertex Processor With Multithread Expanded VLIW Architecture and

提出一种能效优化的移动端顶点处理器,采用多线程扩展VLIW架构和顶点缓存技术
0.18μm CMOS, 100MHz, 120Mvertices/s, 44.7%能耗降低
顶点处理器VLIW架构多线程能效优化OpenGL ES 2.0
创新点1:多线程数据路径消除数据冒险(系统创新)。通过引入四线程并行处理机制,有效避免了传统单线程架构中的数据依赖和资源冲突问题,提升了指令执行效率,支持高达120 Mvertices/s的几何处理性能。
创新点2:四线程四发射扩展VLIW并行架构(架构创新)。采用扩展的超长指令字(VLIW)设计,结合多线程调度,实现了指令级并行(ILP)和线程级并行(TLP)的协同优化,使处理器在100 MHz频率下性能提升3.3倍。
创新点3:专用顶点缓存设计(电路创新)。通过优化顶点缓存结构,减少了主机与顶点处理器间的数据带宽需求,降低了44.7%的平均总能耗,同时加速了几何运算的本地化处理。
创新点4:支持OpenGL ES 2.0和Vertex Shader Model 3.0(兼容性创新)。通过硬件架构的标准化设计,确保了对主流图形API的兼容性,扩展了移动多媒体应用场景的适用性。
Abstract
In this paper, a 3-D vertex processor with a floating- point four-threaded and four-issue expanded VLIW architecture and vertex caches for mobile multimedia applications is proposed. The multi-threaded datapath prevents data hazards, and the multi- issue expanded VLIW architecture enables the processor to have an opportunity to execute instructions in parallel and a well-bal- anced way. The efficient vertex caches are proposed and imple- mented for the embedded vertex processors to accelerate its geom- etry operations and to save bandwidth between hosts and vertex processors. The proposed architecture with the vertex caches re- duces the average total energy dissipation of 44.7% compared to a conventional single-threaded SIMD architecture, and the pro- posed vertex processor achieves 120 Mvertices/s of geometry per- formance which is 3.3 times faster than the previous result, and it supports OpenGL ES 2.0 and Vertex Shader Model 3.0. The pro- cessor is implemented in a 0.18- m 1P4M CMOS process, and the operating frequency is 100 MHz.