← 返回 JSSC 论文列表
📄 下载 JSSC 原文 PDF
JSSC 2022第1期Memory16nm

Scalable and Programmable Neural Network Inference Accelerator Based on In-Memory Computing

基于存内计算的可编程神经网络推理加速器,支持高效并行计算。
16nm CMOS, 3 TOPS, 30 TOPS/W (8-bit)
存内计算神经网络加速器可编程架构混合信号计算高能效
创新点1:高信噪比电容模拟技术(电路创新)。采用基于电容的模拟计算技术,显著提升信号噪声比(SNR),确保模拟计算的精度,实测结果与比特级仿真匹配,支持8位操作下3 TOPS的峰值吞吐和30 TOPS/W的能效。
创新点2:可配置片上网络和核心阵列(系统创新)。通过可编程OCN和可扩展核心阵列架构,解决状态交换开销和硬件利用率问题,支持数据并行和流水线并行,实现灵活神经网络映射。
创新点3:混合信号存内计算与数字计算结合(方法创新)。核心集成模拟存内计算(IMC)与数字SIMD计算,结合可配置缓冲和可编程控制,提升矩阵向量乘(MVM)效率,降低内存访问开销。
创新点4:软件工具链支持(系统创新)。配套开发软件库和神经网络映射工具,实测在CIFAR-10和ImageNet任务中分别实现91.51%和73.33%的准确率,验证硬件抽象层的可扩展性。
Abstract
This work demonstrates a programmable in- memory-computing (IMC) inference accelerator for scalable execution of neural network (NN) models, leveraging a high- signal-to-noise ratio (SNR) capacitor-based analog technology. IMC accelerates computations and reduces memory accessing for matrix-vector multiplies (MVMs), which dominate in NNs. The accelerator architecture focuses on scalable execution, addressing the overheads of state swapping and the challenges of maintaining high utilization across highly dense and parallel hardware. The architecture is based on a configurable on-chip network (OCN) and scalable array of cores, which integrate mixed-signal IMC with programmable near-memory single-instruction multiple- data (SIMD) digital computing, configurable buffering, and programmable control. The cores enable flexible NN execu- tion mappings that exploit data- and pipeline-parallelism to address utilization and efficiency across models. A prototype is presented, incorporating a 4 × 4 array of cores demonstrated in 16 nm CMOS, achieving peak multiply-accumulate (MAC)- level throughput of 3 TOPS and peak MAC-level energy efficiency of 30 TOPS/W, both for 8-b operations. The measured results shows high accuracy of the analog computations, matching bit-true simulations. This enables the abstractions required for robust and scalable architectural and software integration. Developed software libraries and NN-mapping tools are used to demonstrate CIFAR-10 and ImageNet classification, wi