JSSC 2023第5期Memory28nmNeural Network Accelerator

A Fully Bit-Flexible Computation in Memory Macro Using Multi-Functional Computing Bit Cell and Embedded Input Sparsity Sensing Chun-Y en Y ao , Tsung-Y en Wu , Han-Chung Liang, Y u-Kai Chen

提出一种全比特灵活的内存计算宏单元，具有高能效和小面积。

28nm CMOS, 27.7 TOPS/mm², 291 TOPS/W

内存计算比特灵活多功能计算能效面积效率

▸创新点1：多功能计算比特单元设计 - 该论文提出了一种新型的多功能计算比特单元，集成了MAC（乘累加）操作和A/D转换功能，显著提高了计算效率和灵活性。这种设计在28nm CMOS技术中实现了27.7 TOPS/mm²的面积效率和291 TOPS/W的能效，属于电路创新。

▸创新点2：嵌入式输入稀疏感知 - 通过嵌入式输入稀疏感知技术，论文有效减少了能量消耗较高的A/D转换操作，从而提升了整体能效。这一方法创新在动态范围缩放中发挥了关键作用，进一步优化了系统性能。

▸创新点3：自适应动态范围缩放 - 论文提出的自适应动态范围缩放方案能够根据输入数据的特性动态调整计算精度，从而在保证计算准确性的同时最大限度地降低能耗。这一系统创新显著提升了CIM宏的能效和灵活性。

▸创新点4：交错布局结构 - 论文采用交错布局结构来增强权重更新的带宽和布局对称性，进一步提高了计算效率和系统稳定性。这一设计创新在实现高能效和高面积效率方面起到了重要作用。

Abstract

Computation in memory (CIM) overcomes the von Neumann bottleneck by minimizing the communication over- head between memory and process elements. However, using conventional CIM architectures to realize multiply-accumulate operations (MACs) with ﬂexible input and weight bit precision is extremely challenging. This article presents a fully bit-ﬂexible CIM design with a compact area and high energy efﬁciency. The proposed CIM macro employs a novel multi-functional computing bit cell design by integrating the MAC and the A/D conversion to maximize efﬁciency and ﬂexibility. Moreover, an embedded input sparsity sensing and a self-adaptive dynamic range (DR) scaling scheme are proposed to minimize the energy- consuming A/D conversions in CIM. Finally, the proposed CIM macro implementation utilizes an interleaved placement structure to enhance the weight-updating bandwidth and the layout sym- metry. The proposed CIM design fabricated in standard 28-nm CMOS technology achieves an area efﬁciency of 27.7 TOPS/mm 2 and an energy efﬁciency of 291 TOPS/W, demonstrating a highly energy-area-efﬁcient ﬂexible CIM solution.