← 返回 JSSC 论文列表JSSC 2007第1期Clocking & PLLs90 nm
The Design and Implementation of the Massively Parallel Processor Based on the Matrix Architecture Hideyuki Noda, Masami Nakajima, Katsumi Dosaka, Kiyoshi Nakata, Motoki Higashida, Osamu Yamamoto, Katsuya Mizumoto, Tetsushi Tanizaki, Takayuki Gyohten, Yoshihiro Okuno, Hiroyuki Kondo, Yukihiko Shimazu
设计并实现了一种基于矩阵架构的大规模并行处理器,适用于便携式多媒体应用。
200 MHz时钟频率下40 GOPS的16位定点加法性能,功耗250 mW
并行处理器多媒体应用高能效矩阵架构SRAM
▸创新点1:高能效设计(系统创新) - 该论文提出了一种基于矩阵架构的大规模并行处理器,在90 nm CMOS低待机技术下实现了250 mW的低功耗设计,同时保持40 GOPS的高性能,适用于便携式多媒体应用。
▸创新点2:灵活的交换网络(方法创新) - 通过集成2048个2位粒度的处理单元,并采用灵活的交换网络连接,显著提高了数据并行处理能力,支持高效的多媒体数据流处理。
▸创新点3:高集成度(电路创新) - 在3.1 mm²的微小面积内集成了1 Mbit SRAM数据寄存器和2048个处理单元,展示了极高的电路集成密度,适用于资源受限的嵌入式系统。
▸创新点4:高性能与低功耗的平衡(系统创新) - 在200 MHz时钟频率下实现40 GOPS的连续16位定点加法性能,同时功耗仅为250 mW,展现了高性能与低功耗的卓越平衡。
Abstract
This paper describes the design and implementation of the massively parallel processor based on the matrix architec- ture which is suitable for portable multimedia applications. The proposed architecture in this paper achieves the high performance of 40 GOPS in the case of consecutive fixed-point 16-bit additions at 200 MHz clock frequency and the small power dissipation of 250 mW. In addition, 1 Mbit SRAM for data registers and 2048 2-bit-grained processing elements connected by a flexible switching network are integrated in the small area of 3.1 mm /50in 90 nm CMOS low standby technology. These design techniques and ar- chitectures described in this paper are attractive for realizing area- efficient, energy-efficient, and high-performance multimedia pro- cessors.