← 返回 JSSC 论文列表JSSC 2023第1期Digital Circuits4nmNeural Network Accelerator
A Multi-Mode 8k-MAC HW-Utilization-Aware Neural Processing Unit With a Unified Mu
一款4纳米工艺的多模式8k-MAC神经处理单元,支持多种精度计算并优化硬件利用率。
4.26 TFLOPS/W (FP16), 11.59 TOPS/W (INT8), 1.72 TFLOPS/mm², 3.45 TOPS/mm²
神经处理单元多精度计算硬件利用率动态操作模式能效优化
▸统一多精度MAC支持INT4/8/16和FP16数据
▸动态重构计算流以提升硬件利用率
▸支持从极低功耗到低延迟的动态操作模式
Abstract
This article presents an 8k-multiply-accumulate
(MAC) neural processing unit (NPU) in 4-nm mobile system-
on-chip (SoC). The unified multi-precision MACs support from
integer (INT)4/8/16 to floating point (FP)16 data with high
area and energy efficiency. When the NPU meets some layers
having low hardware (HW) utilization, such as depthwise con-
volution or shallow layers with a few input channels, the NPU
reconfigures the computational flow to enhance the utilization
up to four times after getting ba