← 返回 JSSC 论文列表JSSC 2022第3期Digital Circuits
A Neural Network Training Processor With 8-Bit Shared Exponent Bias Floating Poi
提出一种8位共享指数偏置浮点数神经网络训练处理器,支持高性能低精度训练。
硅验证DNN训练处理器,采用24路FMA树
神经网络训练低精度计算浮点运算硬件加速器能效优化
▸创新点1:8位共享指数偏置浮点数(FP8-SEB)格式(方法创新)。FP8-SEB通过共享指数偏置,解决了传统8位浮点数在深度神经网络训练中精度不足的问题,显著提升了训练性能,接近甚至超越FP32精度。
▸创新点2:多路融合乘加(FMA)树设计(电路创新)。采用24路FMA树结构,提高了计算精度并降低了能耗,相比传统设计,能效提升了2.48倍。
▸创新点3:灵活2D路由方案(系统创新)。通过2D路由方案优化数据传输路径,减少了硬件资源占用和延迟,提升了整体系统效率,能耗比标准GPU低78.1倍。
▸创新点4:硅验证的DNN训练处理器(系统创新)。该处理器结合FP8-SEB格式和多路FMA树,实现了高效能低功耗的神经网络训练,适用于数据中心和边缘设备。
Abstract
Recent advances in deep neural networks (DNNs)
and machine learning algorithms have induced the demand for
services based on machine learning algorithms that require a
large number of computations, and specialized hardware ranging
from accelerators for data centers to on-device computing systems
have been introduced. Low-precision math such as 8-bit integers
have been used in neural networks for energy-efficient neural
network inference, but training with low-precision numbers with-
out performan