JSSC 2022第10期Digital Circuits28nm

Trainer: An Energy-Efﬁcient Edge-Device Training Processor Supporting Dynamic Weight Pruning Y ang Wang , Y ubin Qin , Dazheng Deng, Jingchuan Wei, Tianbao Chen, Xinhan Lin

提出一种支持动态权重剪枝的能效边缘设备训练处理器Trainer

28nm CMOS, 20.96mm²面积

边缘计算稀疏训练能效优化动态剪枝批归一化

▸推测机制消除隐式冗余操作

▸动态稀疏自适应数据流解决重用不平衡

▸计算依赖解耦的批归一化单元减少重复数据访问

Abstract

Transfer learning, which transfers knowledge from source datasets to target datasets, is practical for adaptive deep neural network (DNN) applications. When considering user pri- vacy and communication bandwidth issues, edge devices’ training is essential for transfer learning. Nevertheless, training requires repeating feedforward (FF), backpropagation (BP), and weight gradient (WG) millions of times, introducing prohibitive compu- tation for edge devices. A promising method to reduce training computation is sparse DNN training (SDT), which dynamically prunes weights during training iterations and performs FF, BP, and WG only with unpruned weights. However, SDT suffers implicit redundancy and reuse imbalance for convolution layers. Besides, it turns bottlenecks into batch normalization (BN) layers. Therefore, it is challenging to achieve energy-efﬁcient SDT computing. This article proposes a processor, Trainer, solving the above challenges with three f eatures. First, a speculation mechanism removes implicit redundant operations, which have nonzeros’ input, weight, or output, but are ineffective for train- ing. Second, a dynamic sparsity adaptive dataﬂow tackles the reuse imbalance, improving energy efﬁciency (EE) for dynamic sparse convolution in SDT. Third, a computational dependence decoupled BN unit eliminates BN’s repeated data access to reduce training energy and time. Trainer is fabricated in 28-nm CMOS technology and occupies 20.96 mm 2 of area. It achieves a peak EE