JSSC 2023第3期MemoryCIM

T-PIM: An Energy-Efﬁcient Processing-in-Memory Accelerator for End-to-End On-Device Training

T-PIM是一种支持端到端设备上训练的高效能存内计算加速器

未明确提及具体性能指标

存内计算设备端训练神经网络加速器能效优化可变比特精度

▸采用8T-SRAM单元实现能效优化的存内计算

▸支持可变比特精度的位串行计算逻辑

▸创新的分块方案支持任意规模神经网络计算

Abstract

Recently, on-device training has become crucial for the success of edge intelligence. However, frequent data movement between computing units and memory during training has been a major problem for battery-powered edge devices. Processing-in-memory (PIM) is a novel computing paradigm that merges computing logic into memory, which can address the data movement problem with excellent power efﬁciency. However, previous PIM accelerators cannot support the entire training process on chip due to its computing complexity. This article presents a PIM accelerator for end-to-end on-device training (T-PIM), the ﬁrst PIM realization that enables end- to-end on-device training as well as high-speed inference. Its full-custom PIM macro contains 8T-SRAM cells to perform the energy-efﬁcient in-cell AND operation and the bit-serial- based computation logic enables fully variable bit-precision for input data. The macro supports various data mapping methods and computational paths for both fully connected and convolutional layers, in order to handle the complex training process. An efﬁcient tiling scheme is also proposed to enable T-PIM to compute any size of deep neural network with the implemented hardware. In addition, conﬁgurable arithmetic units in a forward propagation path make T-PIM handle power-of-two bit-precision for weight data, enabling a signiﬁcant performance boost during inference. Finally, T-PIM efﬁciently handles sparsity in both operands by skipping the computation of zeros i