← 返回 JSSC 论文列表
📄 下载 JSSC 原文 PDF
JSSC 2020第4期Digital Circuits16nm

A 032-128 TOPS- Scalable Multi-Chip-Module-Based Deep Neural Network Inference A

基于多芯片模块的可扩展DNN加速器,实现高效能推理
16nm工艺, 1.29-TOPS/mm²面积效率, 0.11 pJ/op能效, 127.8 TOPS峰值性能
深度神经网络多芯片模块能效优化可扩展架构推理加速
创新点1:多芯片模块(MCM)网格网络连接,通过36芯片的网格网络实现灵活扩展,支持从移动设备到数据中心的广泛DNN推理需求,显著提升系统可扩展性和适应性。
创新点2:地面参考信号(GRS)技术,采用GRS技术优化芯片间通信,降低信号传输功耗和噪声,提升整体通信效率和可靠性。
创新点3:分层片上网络和封装网络优化通信,结合片上分布式权重存储和分层网络设计,最小化通信能耗,提升系统整体能效和性能。
创新点4:高性能指标,16nm工艺下实现1.29 TOPS/mm²的面积效率、0.11 pJ/op的能效效率,36芯片系统峰值性能达127.8 TOPS,ResNet-50推理速度达1903 images/s,显著提升DNN推理性能。
Abstract
Custom accelerators improve the energy efficiency, area efficiency, and performance of deep neural network (DNN) inference. This article presents a scalable DNN accelerator consisting of 36 chips connected in a mesh network on a multi- chip-module (MCM) using ground-referenced signaling (GRS). While previous accelerators fabricated on a single monolithic chip are optimal for specific network sizes, the proposed architecture enables flexible scaling for efficient inference on a wide range of DNNs, fro