Abstract
Reduced precision computation is a key enabling
factor for energy-efficient acceleration of deep learning (DL)
applications. This article presents a 7-nm four-core mixed-
precision artificial intelligence (AI) chip that supports four
compute precisions—FP16, Hybrid-FP8 (HFP8), INT4, and
INT2—to support diverse application demands for training
and inference. The chip leverages cutting-edge algorithmic
advances to demonstrate leading-edge power efficiency for 8-bit
floating-point (FP8) training and IN