JSSC 2019第1期Digital Circuits65nmNeural Network Accelerator

UNPU: An Energy-Efﬁcient Deep Neural Network Accelerator With Fully Variable Weight

提出一种能效优化的深度神经网络加速器UNPU，支持可变权重精度和多种网络层类型。

65nm CMOS, 0.63-1.1V, 200MHz, 峰值性能345.6GOPS(16bit)-7372GOPS(1bit)

神经网络加速器能效优化可变精度移动深度学习硬件架构

▸支持1至16位可变权重精度

▸基于查找表的位串行处理单元降低能耗

▸统一架构提升卷积层峰值性能1.15倍

Abstract

An energy-efﬁcient deep neural network (DNN) accelerator, uniﬁed neural processing unit (UNPU), is proposed for mobile deep learning applications. The UNPU can support both convolutional layers (CLs) and recurrent or fully connected layers (FCLs) to support versatile workload combinations to accelerate various mobile deep learning applications. In addition, the UNPU is the ﬁrst DNN accelerator ASIC that can support fully variable weight bit precision from 1 to 16 bit. It enables the UNPU to operate on the accuracy-energy optimal point. Moreover, the lookup table (LUT)-based bit-serial processing element (LBPE) in the UNPU achieves the energy consumption reduction compared to the conventional ﬁxed-point multiply-and- accumulate (MAC) array by 23.1%, 27.2%, 41%, and 53.6% for the 16-, 8-, 4-, and 1-bit weight precision, respectively. Besides the energy efﬁciency improvement, the uniﬁed DNN core architec- ture of the UNPU improves the peak performance for CL by 1.15× compared to the previous work. It makes the UNPU operate on the lower voltage and frequency for the given DNN to increase energy efﬁciency. The UNPU is implemented in 65-nm CMOS technology and occupies the 4 × 4m m 2 die area. The UNPU can operates from 0.63- to 1.1-V supply voltage with maximum frequency of 200 MHz. The UNPU has peak perfor- mance of 345.6 GOPS for 16-bit weight precision and 7372 GOPS for 1-bit weight precision. The wide operating range of UNPU makes the UNPU achieve the power efﬁciency of 3.08 TO