JSSC 2024第9期Memory28nmCIM

FLEX-CIM: A Flexible Kernel Size 1-GHz 181.6-TOPS/W 25.63-TOPS/mm2 Analog Compute-in-Memory Macro

FLEX-CIM提出一种灵活核大小的存内计算架构，提升CNN加速器的能效和吞吐密度。

28nm CMOS, 4-bit MAC精度, 181.6 TOPS/W能效, 25.63 TOPS/mm2吞吐密度

存内计算卷积神经网络能效优化吞吐密度自适应ADC

▸创新点1：模拟部分和电路实现灵活核大小（电路创新）。通过引入模拟部分和（APS）电路，FLEX-CIM支持灵活的卷积核大小，显著提升了小卷积核层的利用率，达到99.2%，解决了传统CIM利用率低的问题。

▸创新点2：超频快速乘累加阵列提升吞吐量（电路创新）。采用超频技术优化快速乘累加阵列（FMA），大幅提升了计算吞吐量，减少了计算时间，进一步提高了系统的整体性能。

▸创新点3：自适应分辨率ADC增强能效和吞吐（电路创新）。引入自适应分辨率ADC，根据计算需求动态调整分辨率，有效减少了ADC的计算时间和能耗，提升了系统的能效和吞吐量，峰值能效达到181.6 TOPS/W。

▸创新点4：高密度计算架构设计（系统创新）。通过优化整体架构设计，FLEX-CIM实现了高计算密度，峰值吞吐密度达到25.63 TOPS/mm²，显著提升了单位面积的计算能力。

Abstract

Compute-in-memory (CIM) is a promising approach for realizing energy-efficient convolutional neural network (CNN) accelerators. Previous CIM works demonstrated a high peak energy efficiency of over 100 TOPS/W, with larger fabrics of 1000+ channels. Yet, they typically suffer from low utilization for small CNN layers (e.g., ∼9% for ResNet-32). It penalizes their average energy efficiency, throughput density, and effective memory size by the utilization rate. In addition, the analog-to-digital converter (ADC) occupies most of their computing time (∼90%), further hindering the CIM’s throughput. This work presents an FLEX-CIM fabricated under 28-nm CMOS featuring: 1) an analog partial sum (APS) circuit to enable a flexible CIM Kernel size; 2) an overclocked fast multiply–accumulate array (FMA) to boost the throughput; and 3) an adaptive-resolution ADC to enhance the throughput and energy efficiency. The achieved utilization is 99.2% on ResNet-32. Under 4-bit MAC precision, the peak energy efficiency is 181.6 TOPS/W, and the peak throughput density is 25.63 TOPS/mm 2.