← 返回 JSSC 论文列表JSSC 2020第4期Digital Circuits16nm
A 032-128 TOPS- Scalable Multi-Chip-Module-Based Deep Neural Network Inference A
基于多芯片模块的可扩展DNN加速器,实现高效能推理
16nm工艺, 1.29-TOPS/mm²面积效率, 0.11 pJ/op能效, 127.8 TOPS峰值性能
深度神经网络多芯片模块能效优化可扩展架构推理加速
▸创新点1:多芯片模块(MCM)网格网络连接,通过36芯片的网格网络实现灵活扩展,支持从移动设备到数据中心的广泛DNN推理需求,显著提升系统可扩展性和适应性。
▸创新点2:地面参考信号(GRS)技术,采用GRS技术优化芯片间通信,降低信号传输功耗和噪声,提升整体通信效率和可靠性。
▸创新点3:分层片上网络和封装网络优化通信,结合片上分布式权重存储和分层网络设计,最小化通信能耗,提升系统整体能效和性能。
▸创新点4:高性能指标,16nm工艺下实现1.29 TOPS/mm²的面积效率、0.11 pJ/op的能效效率,36芯片系统峰值性能达127.8 TOPS,ResNet-50推理速度达1903 images/s,显著提升DNN推理性能。
Abstract
Custom accelerators improve the energy efficiency,
area efficiency, and performance of deep neural network (DNN)
inference. This article presents a scalable DNN accelerator
consisting of 36 chips connected in a mesh network on a multi-
chip-module (MCM) using ground-referenced signaling (GRS).
While previous accelerators fabricated on a single monolithic chip
are optimal for specific network sizes, the proposed architecture
enables flexible scaling for efficient inference on a wide range
of DNNs, fro