← 返回 JSSC 论文列表JSSC 2025第3期Memory28nmSRAMCIM
TTCIM A Tensor-Train In-Memory-Computing Processor Using Bit-Level-Sparsity Opti
提出TT@CIM处理器,利用张量分解和位级稀疏优化技术提升内存计算能效。
28nm CMOS, 峰值能效5.99-691.13 TOPS/W
内存计算张量分解位级稀疏能效优化量化处理
▸创新点1:TTD-CIM匹配数据流优化(系统创新) - 通过设计专门匹配张量分解(TTD)的数据流,最大化CIM存储器的利用率,减少额外MAC操作,显著提升计算效率,支持4/8位分解DNN的高效处理。
▸创新点2:位级稀疏编码CIM宏设计(电路创新) - 提出高比特级稀疏编码方案的CIM宏,优化MAC操作的功耗,通过减少冗余计算实现单次MAC操作功耗降低,提升能效至5.99-691.13 TOPS/W。
▸创新点3:可变精度量化方法(方法创新) - 结合查找表(LUT)的量化单元,动态调整量化精度,优化QuantOp的性能和能效,解决TTD引入的量化操作瓶颈问题。
▸创新点4:张量分解压缩技术(方法创新) - 应用TTD方法压缩完整DNN模型至CIM-SRAM容量内,消除片外通信瓶颈,首次实现全模型片上存储,突破传统CIM存储限制。
Abstract
Computing-in-memory (CIM) is an attractive
approach for energy-efficient deep neural network (DNN)
processing, especially for low-power edge devices. However,
today’s typical DNNs usually exceed CIM-static random access
memory (SRAM) capacity. The int roduced off-chip communica-
tion covers up the benefits of CIM technique, meaning that CIM
processors still encounter the memory bottleneck. To eliminate
this bottleneck, we propose a CIM processor, called TT@CIM,
which applies the tensor-train decom