← 返回 JSSC 论文列表JSSC 2009第5期Digital Circuits0.18μm CMOS
An Embedded Stream Processor Core Based on Logarithmic Arithmetic for a Low-Powe
开发了一种基于对数算术的低功耗高性能4路32位流处理器核心,用于手持3D图形系统。
200 MHz, 86.8 mW, 141 Mvertices/s, 17.2 mm², 1.57 M transistors, 29 kB SRAM
流处理器对数算术低功耗3D图形顶点着色器
▸创新点1:采用对数算术和自适应数字转换方案,显著提升计算效率。该技术通过优化浮点运算单元,实现单周期吞吐量(除矩阵-向量乘法外,仅需2周期,传统方法需4周期),性能提升17.5%,功耗降低44.7%。
▸创新点2:嵌入式寄存器索引计算技术,通过硬件级优化减少指令周期数。该设计在OpenGL变换与光照(TnL)操作中减少19.1%的周期数,提升流处理器核心的指令执行效率。
▸创新点3:功能单元动态重新配置和对数域操作数转发机制,支持多模式运算(矩阵、向量、初等函数)。通过硬件资源复用降低面积39.4%,同时保持141 Mvertices/s的峰值几何变换性能。
▸创新点4:三重电源域动态电压频率调节(DVFS)技术,实现SoC级功耗管理。在60 fps下功耗仅52.4 mW,较同类方案降低50.5%,适用于手持设备低功耗场景。
Abstract
and Hoi-Jun Y oo , Fellow, IEEE
Abstract—A low-power and high-performance 4-way 32-bit
stream processor core is developed for handheld low-power
3-D graphics systems. It contains a floating-point unified matrix,
vector, and elementary function unit. By exploiting the logarithmic
arithmetic and the proposed adaptive number conversion scheme,
a 4-way arithmetic unit achieves a single-cycle throughput for
all these operations except for the matrix-vector multiplication
that takes 2 cycles per result,