JSSC 2023第5期Memory28nm

PIMCA: A Programmable In-Memory Computing Accelerator for Energy-Efﬁcient DNN Inference Bo Zhang , Student Member , IEEE

提出一种可编程存内计算加速器PIMCA，用于低精度DNN推理，实现高能效计算。

28nm CMOS, 1V, 42MHz, 437 TOPS/W, 49 TOPS

存内计算深度神经网络模拟混合信号可编程加速器能效优化

▸采用10T1C定制位单元，支持基于电容耦合的模拟混合信号乘累加运算

▸集成108个IMC SRAM宏，支持单周期矩阵向量乘法

▸定制六级流水线和指令集架构，支持硬件循环以减少程序大小

Abstract

This article presents a programmable in-memory computing accelerator (PIMCA) for low-precision (1–2 b) deep neural network (DNN) inference. The custom 10T1C bitcell in the in-memory computing (IMC) macro has four additional transistors and one capacitor to perform capacitive-coupling- based multiply and accumulation (MAC) in analog-mixed-signal (AMS) domain. A macro containing 256 × 128 bitcells can simultaneously activate all the rows, and as a result, it can perform a matrix-vector multiplication (VMM) in one cycle. PIMCA integrates 108 of such IMC static random-access mem- ory (SRAM) macros with the custom six-stage pipeline and the custom instruction set architecture (ISA) for instruction- level programmability. The results of IMC macros are fed to a single-instruction-multiple-data (SIMD) processor for other computations such as partial sum accumulation, max-pooling, activation functions, etc. To effectively use the IMC and SIMD datapath, we customize the ISA especially by adding hardware loop support, which reduces the program size by up to 73%. The accelerator is prototyped in a 28-nm technology, and integrates a total of 3.4-Mb IMC SRAM and 1.5-Mb off-the-shelf activation SRAM, demonstrating one of the largest IMC accelerators to date. It achieves the system-level energy efﬁciency of 437 TOPS/W and the peak throughput of 49 TOPS at the 42-MHz clock frequency and 1-V supply for the VGG9 and the ResNet-18 on the CIFAR-10 dataset.