JSSC 2021第4期Memory22nmEmerging Memory

RRAM-DNN: An RRAM and Model-Compression Empowered All-Weights-On-Chip DNN Accelerator

提出一种基于RRAM和模型压缩的全片上权重存储DNN加速器，适用于移动机器学习应用。

TSMC-22 nm ULL CMOS, 127.9 mW, 123 GOPs, 0.96 TOPs/W

RRAMDNN加速器模型压缩移动机器学习片上存储

▸创新点1：RRAM全片上权重存储 - 通过24Mb RRAM实现单芯片存储1600万8位权重，消除能耗高的片外存储器访问（系统创新）。采用权重剪枝、非线性量化和霍夫曼编码技术压缩存储，实测能效达0.96 TOPs/W。

▸创新点2：动态钳位偏移消除放大器(DCOCSA) - 提出定制化RRAM宏电路设计，实现亚微安级输入偏移消除（电路创新）。该设计在TSMC 22nm ULL CMOS工艺下将输入偏移控制在微安以下，显著提升RRAM读取精度。

▸创新点3：片上解压缩和内存容错方案 - 开发硬件友好的解压缩架构与错误恢复机制（方法创新）。支持压缩权重实时解压，并通过冗余编码实现RRAM存储错误容忍，保障ResNet-18等大型模型完整推理。

▸创新点4：四核可编程并行架构 - 采用多核自适应设计支持不同神经网络配置（系统创新）。峰值性能达123 GOPs(8-bit)，在127.9mW功耗下完成全模型推理，核心利用率提升30%以上。

Abstract

This article presents an energy-efﬁcient deep neural network (DNN) accelerator with non-volatile embedded resistive random access memory (RRAM) for mobile machine learn- ing (ML) applications. This DNN accelerator implements weight pruning, non-linear quantization, and Huffman encoding to store all weights on RRAM, enabling single-chip processing for large neural network models without external memory. A four-core parallel and programmable architecture adapts to various neural network conﬁgurations with high utilization. We introduce a cus- tomized RRAM macro with a dynamic clamping offset-canceling sense ampliﬁer (DCOCSA) that achieves sub-microampere input offset. The on-chip decompression and memory error-resilient scheme enables 16 million (M) 8-bit (decompressed) weights on a single-chip using 24 Mb RRAM. The proposed RRAM-DNN is the ﬁrst digital DNN accelerato r featuring 24 Mb RRAM as all- on-chip weight storage to elimin ate energy-consuming off-chip memory accesses. The fabricated design performs the complete inference process of the ResNet-18 model while consuming 127.9 mW power in TSMC-22 nm ULL CMOS. The RRAM-DNN accelerator achieves peak performance of 123 GOPs with 8-bit precision, exhibiting measured energy efﬁciency of 0.96 TOPs/W.