JSSC 2023第5期Memory28nmSRAM

In Situ Storing 8T SRAM-CIM Macro for Full-Array Boolean Logic and

提出一种双向8T SRAM-CIM宏单元，实现全阵列布尔逻辑运算和原位存储，提升计算效率和能效。

28nm CMOS, 0.66V, 1851.4 GOPS, 270.5 TOPS/W

存内计算8T SRAM布尔逻辑原位存储能效提升

▸创新点1：双向8T SRAM阵列结构（电路创新）：提出了一种双向8T SRAM阵列结构，支持双向读写操作，显著提升了数据访问效率，解决了传统CIM中写回结果的瓶颈问题。

▸创新点2：原位存储计算结果（系统创新）：通过自循环8T单元，实现了计算结果在单个周期内的原位存储，避免了额外的存储开销，大幅降低了延迟和能耗。

▸创新点3：单周期数据行复制（方法创新）：通过控制8T单元中的中间晶体管，实现了任意数据行在单个周期内的复制操作，提升了数据处理的灵活性和效率。

▸创新点4：高性能与能效优化（系统创新）：在28nm CMOS工艺下实现了16-kb SRAM，吞吐量达到1851.4 GOPS，能效高达270.5 TOPS/W，较现有CIM宏提升了3-56.6倍的吞吐量，并在AES算法中实现了47.5%-63%的能效提升。

Abstract

Computing in-memory (CIM) is a promising new computing method to solve problems caused by von Neumann bottlenecks. It mitigates the need for transmitting large amounts of data between the processing and memory units, signiﬁcantly decreasing the latency and energy consumption. However, writing back the calculation results for CIM can become a new bottleneck if only parallel computing is implemented. This study proposes a bidirectional static random access memory (SRAM) array structure comprising self-cycling eight-transistor (8T) cells, which can achieve full-array Boolean logic operations and read/write in two directions. The CIM results can be restored in in situ bit cells in a single cycle without additional memory. In addition, any data row can be copied into another row by controlling the intermediate transistor in the 8T cell. A 16-kb SRAM was implemented in the 28-nm CMOS technology to verify the effectiveness of the proposed design. The throughput of the proposed CIM macro is 1851.4 GOPS. Compared with the existing CIM macros, the throughput increased 3–56.6 times and the energy efﬁciency was as high as 270.5 TOPS/W at a supply voltage of 0.66 V . When the proposed circuits were applied to advanced encryption standard (AES) algorithms, the energy efﬁciency is increased by about 47.5%–63% compared to the von Neumann architecture.