← 返回 JSSC 论文列表JSSC 2024第3期Digital Circuits65nm
FreFlex A High-Performance Processor for Convolution and Attention Computations
FreFlex处理器通过稀疏自适应动态频率调制和二维脉动阵列提升卷积和注意力计算的能效和性能。
160 GOPS/s/mm², 1.1 GHz, 0.6–1.0 TOPS/W
AI加速器稀疏性动态频率调制卷积计算注意力计算
▸创新点1:稀疏自适应动态频率调制(SA-DFM)是一种系统级创新,通过动态调整时钟频率以匹配输入数据的稀疏性,从而在保持功耗预算的同时提升性能。该方法在无稀疏性时仅增加7%的功耗,而在高稀疏性下可实现1.8倍的性能提升。
▸创新点2:二维脉动阵列处理单元是一种电路创新,优化了卷积和注意力计算的数据流,通过并行处理和数据重用提高了计算效率。该设计在65nm CMOS工艺下实现了1.1 GHz的最大频率和160 GOPS/s/mm²的性能密度。
▸创新点3:利用稀疏性提升性能是一种方法创新,通过实时统计输出层的零元素数量来预测下一层的稀疏性,从而动态调整硬件资源。这一方法在0.6-1.0 TOPS/W的能效范围内显著提升了计算效率。
▸创新点4:硅原型验证展示了该设计的实际可行性,在65nm CMOS节点上实现了高能效(0.6-1.0 TOPS/W)和高性能(160 GOPS/s/mm²),为稀疏计算硬件提供了可扩展的解决方案。
Abstract
A high degree of sparsity in machine learning
(ML) models has been highlighted as a significant opportunity
to improve energy and delay efficiencies by skipping the
computation of zero elements in operands. Despite the potential,
its unstructured positions of zeros and a wide range of
sparsity make it challenging to exploit this nature in hardware
implementations that are often built on regular structures.
To address these challenges, this article presents a low-power
and high-performance AI acc