JSSC 2022第3期Digital Circuits

A Neural Network Training Processor With 8-Bit Shared Exponent Bias Floating Point and Multiple-Way Fused Multiply-Add Trees

提出一种8位共享指数偏置浮点数神经网络训练处理器，支持高性能低精度训练。

硅验证DNN训练处理器，采用24路FMA树

神经网络训练低精度计算浮点运算硬件加速器能效优化

▸创新点1：8位共享指数偏置浮点数(FP8-SEB)格式（方法创新）。FP8-SEB通过共享指数偏置，解决了传统8位浮点数在深度神经网络训练中精度不足的问题，显著提升了训练性能，接近甚至超越FP32精度。

▸创新点2：多路融合乘加(FMA)树设计（电路创新）。采用24路FMA树结构，提高了计算精度并降低了能耗，相比传统设计，能效提升了2.48倍。

▸创新点3：灵活2D路由方案（系统创新）。通过2D路由方案优化数据传输路径，减少了硬件资源占用和延迟，提升了整体系统效率，能耗比标准GPU低78.1倍。

▸创新点4：硅验证的DNN训练处理器（系统创新）。该处理器结合FP8-SEB格式和多路FMA树，实现了高效能低功耗的神经网络训练，适用于数据中心和边缘设备。

Abstract

Recent advances in deep neural networks (DNNs) and machine learning algorithms have induced the demand for services based on machine learning algorithms that require a large number of computations, and specialized hardware ranging from accelerators for data centers to on-device computing systems have been introduced. Low-precision math such as 8-bit integers have been used in neural networks for energy-efﬁcient neural network inference, but training with low-precision numbers with- out performance degradation have remained to be a challenge. To overcome this challenge, this article presents an 8-bit ﬂoating- point neural network training processor for state-of-the-art non- sparse neural networks. As naïve 8-bit ﬂoating-point numbers are insufﬁcient for training DNNs robustly, two additional methods are introduced to ensure high-performance DNN training. First, a novel numeric system which we dub as 8-bit ﬂoating point with shared exponent bias (FP8-SEB) is introduced. Moreover, multiple-way fused multiply-add (FMA) trees are used in FP8- SEB’s hardware implementation to ensure higher numerical precision and reduced energy. FP8-SEB format combined with multiple-way FMA trees is evaluated under various scenarios to show a trained-from-scratch performance that is close to or even surpasses that of current networks trained with full-precision (FP32). Our silicon-veriﬁed DNN training processor utilizes 24-way FMA trees implemented with FP8-SEB math and ﬂexible 2-D routing schemes