JSSC 2022第1期Other7nm

A 7-nm Four-Core Mixed-Precision AI Chip With 26.2-TFLOPS Hybrid-FP8 Training, 104.9-TOPS INT4 Inference, and Workload-Aware

一款7纳米四核混合精度AI芯片，支持FP16、HFP8、INT4和INT2计算精度，用于高效深度学习训练和推理。

262-TFLOPS Hybrid-FP8

混合精度AI芯片深度学习FP8训练INT4推理

▸支持四种计算精度（FP16、HFP8、INT4、INT2）

▸采用7纳米工艺实现高能效

▸支持8位浮点（FP8）训练和INT4推理

Abstract

Reduced precision computation is a key enabling factor for energy-efﬁcient acceleration of deep learning (DL) applications. This article presents a 7-nm four-core mixed- precision artiﬁcial intelligence (AI) chip that supports four compute precisions—FP16, Hybrid-FP8 (HFP8), INT4, and INT2—to support diverse application demands for training and inference. The chip leverages cutting-edge algorithmic advances to demonstrate leading-edge power efﬁciency for 8-bit ﬂoating-point (FP8) training and INT4 inference without model Manuscript received May 14, 2021; revised August 8, 2021 and September 27, 2021; accepted October 1, 2021. Date of publication November 10, 2021; date of current version December 29, 2021. This article was approved by Associate Editor Jun Deguchi. (Corresponding author: Sae Kyu Lee.) Sae Kyu Lee, Ankur Agrawal, Joel Silberman, Matthew Ziegler, Swagath V enkataramani, Nianzheng Cao, Bruce Fleischer, Michael Guillorn, Matthew Cohen, Martin Lutz, Jinw ook Jung, Siyu Koswatta, Ching Zhou, Vidhi Zalani, Monodeep Kar, Chia -Y u Chen, Alyssa Herbert, Radhika Jain, Kyu-Hyoun Kim, Y ulong Li, Zhibin Ren, Marcel Schaal, Michael R. Scheuermann, Xiao Sun, Hung Tran, Naigang Wang, Wei Wang, Xin Zhang, Vijayalakshmi Srini vasan, Pong-Fei Lu, Sunil Shukla, Kailash Gopalakrishnan, and Leland Chang are with the IBM T. J. Watson Research Center, Y orktown Heights, NY 10598 USA (e-mail: saekyu.lee@ ibm.com). Mingu Kang was with IBM Research, Y orktown Heights, NY 10598 USA. He