ISSCC 2023
Session 33
AI / ML
A 28nm 2Mb STT-MRAM Computing-in-Memory Macro with a Refined Bit-Cell and 22.4 – 41.5TOPS/W for AI Inference
Emerging non-volatile memory-based computing-in-memory (CIM) is an excellent fit for resource-constrained edge-AI devices [1-6]. MRAM-CIM macros for MAC operations, at present, rely on a crossbar structure or a periphera
ISSCC 2023
Session 33
AI / ML
A 9Mb HZO-Based Embedded FeRAM with 1012-Cycle Endurance and 5/7ns Read/Write using ECC-Assisted Data Refresh and Offset-Canceled Sense Amplifier
University, Shanghai, China 3 Zhejiang Lab, Hangzhou, China 1 voltage (VT) has a positive coefficient of about +0.085mV/°C. By setting a proper R2:R3 ratio, a tracking voltage (Vtrack) can be generated with an expected r
ISSCC 2023
Session 33
AI / ML
A 22nm 8Mb STT-MRAM Near-Memory-Computing Macro with 8b-Precision and 46.4-160.1TOPS/W for Edge-AI Devices
Yu-An Chien1, Guan-Yi Lin1, Po-Jung Chen1, Tsen-Hsiang Pan1, De-Qi You1, Fang-Yi Chen1, Andrew Lee1, Chung-Chuan Lo1, Ren-Shuo Liu1, Chih-Cheng Hsieh1, Kea-Tiong Tang1, Yu-Der Chih3, Tsung-Yung Chang3, Meng-Fan Chang1,2
ISSCC 2023
Session 33
AI / ML
A 16nm 32Mb Embedded STT-MRAM with a 6ns Read-Access
Po-Hao Lee, Chia-Fu Lee, Yi-Chun Shih, Hon-Jarn Lin, Yen-An Chang, Cheng-Han Lu, Yu-Lin Chen, Chieh-Pu Lo, Chung-Chieh Chen, Cheng-Hsiung Kuo, Tan-Li Chou, Chia-Yu Wang, J. J. Wu, Roger Wang, Harry Chuang, Yih Wang, Yu-D
ISSCC 2023
Session 32
AI / ML
SciCNN: A 0-Shot-Retraining Patient-Independent Epilepsy-Tracking SoC
Institute for Health, Singapore, Singapore 3 Apple, Cupertino, CA 4 Huawei Technologies, Chengdu, China 1 2 Patient-specific seizure-detection SoCs targeting ambulatory seizure treatment [1-8] achieve outstanding accurac
ISSCC 2023
Session 29
AI / ML
Wafer-Level Stacking of High-Density Capacitors to Enhance the Performance of a Large Multicore Processor for Machine Learning Applications
Graphcore, Bristol, United Kingdom Graphcore, Adelaide, Australia 1 2 The 822mm2 Colossus Mk2x is a chip made by stacking and fusing two separately processed 12-inch wafers prior to probe-test, singulation and packaging.
ISSCC 2023
Session 29
AI / ML
Snap-SAT: A One-Shot Energy-Performance-Aware All-Digital Compute-in-Memory Solver for Large-Scale Hard Boolean Satisfiability Problems
Sirish Oruganti, Jaydeep P. Kulkarni University of Texas, Austin, TX Boolean satisfiability (SAT) is a non-deterministic polynomial time (NP)-complete problem with many practical and industrial data-intensive application
ISSCC 2023
Session 29
AI / ML
A 32.5mW Mixed-Signal Processing-in-Memory-Based k-SAT Solver in 65nm CMOS with 74.0% Solvability for 30-Variable 126-Clause 3-SAT Problems
Boolean satisfiability (k-SAT, k ≥3) is an NP-complete combinatorial optimization problem (COP) with applications in communication, flight network, supply chain and finance, to name a few. The ASICs for SAT and other COP
ISSCC 2023
Session 22
AI / ML
A 12nm 18.1TFLOPs/W Sparse Transformer Processor with
Thierry Tambe1, Jeff Zhang1, Coleman Hooper1, Tianyu Jia2, Paul N. Whatmough1,3, Joseph Zuckerman4, Maico Cassel Dos Santos4, Erik Jens Loscalzo4, Davide Giri4, Kenneth Shepard4, Luca Carloni4, Alexander Rush5, David Bro
ISSCC 2023
Session 22
AI / ML
ANP-I: A 28nm 1.5pJ/SOP Asynchronous Spiking Neural Network Processor Enabling Sub-0.1µJ/Sample On-Chip Learning for Edge-AI Applications
With the development of on-chip learning processors for edge-AI applications, energy efficiency of NN inference and training is more and more critical. As on-chip training energy dominates the energy consumption of edge-
ISSCC 2023
Session 22
AI / ML
C-DNN: A 24.5-85.8TOPS/W Complementary-Deep-NeuralNetwork Processor with Heterogeneous CNN/SNN Core Architecture and Forward-Gradient-Based Sparsity Generation
have been shown to achieve the same accuracy as Convolutional-Neural-Networks (CNNs). By using CNN-to-SNN conversion, SNNs become a promising candidate for ultra-low power AI applications [1]. For example, compared to BN
ISSCC 2023
Session 22
AI / ML
A 127.8TOPS/W Arbitrarily Quantized 1-to-8b ScalablePrecision Accelerator for General-Purpose Deep Learning
deep learning accelerators has focused on inference tasks to improve performance by means of maximally utilizing sparsity and quantization. Unlike CNN-only networks, however, recent state-of-the-art (SOTA) models consist
ISSCC 2023
Session 2
AI / ML
VISTA: A 704mW 4K-UHD CNN Processor for Video and Image Spatial/Temporal Interpolation Acceleration
Video convolutional neural networks (CNNs) have achieved great success in highresolution imaging applications, such as video super-resolution (VSR) and demonstrated superior quality and temporal consistency by leveraging
ISSCC 2023
Session 16
AI / ML
A 40-310TOPS/W SRAM-Based All-Digital Up to 4b In-Memory Computing Multi-Tiled NN Accelerator in FD-SOI 18nm for Deep-Learning Edge Applications
Harsh Rawat2, Hitesh Chawla2, Abhijith VS2, Paolo Zambotti4, Akhilesh Sharma2, Carmine Cappetta1, Michele Rossi1, Antonio De Vita1, Francesca Girardi1 STMicroelectronics, Cornaredo, Italy STMicroelectronics, Noida, India
ISSCC 2023
Session 16
AI / ML
A Nonvolatile AI-Edge Processor with 4MB SLC-MLC Hybrid-Mode ReRAM Compute-in-Memory Macro and 51.4-251TOPS/W
Yun-Chen Lo1, Chuan-Jia Jhang1,2, Hung-Hsi Hsu1, Yu-Hsiang Chin1, Yu-Chiao Chen1, Chung-Chuan Lo1, Ren-Shuo Liu1, Kea-Tiong Tang1, Chih-Cheng Hsieh1, Yu-Der Chih3, Tsung-Yung Chang3, Meng-Fan Chang1,2 National Tsing Hua
ISSCC 2023
Session 16
AI / ML
A 28nm 53.8TOPS/W 8b Sparse Transformer Accelerator with In-Memory Butterfly Zero Skipper for Unstructured-Pruned NN and CIM-Based Local-Attention-Reusable Engine
Transformer networks, from BERT, GPT to Alphafold, have demonstrated unprecedented advances in a variety of AI tasks. Fig. 16.2.1 shows the computing flow of self-attention – the fundamental operation in transformers. Qu
ISSCC 2023
Session 16
AI / ML
MulTCIM: A 28nm 2.24µJ/Token Attention-Token-Bit Hybrid Sparse Digital CIM-Based Accelerator for Multimodal Transformers
natural language, speech, etc. Multimodal Transformer (MulT, Fig. 16.1.1) models introduce a cross-modal attention mechanism to vanilla transformers to learn from different modalities, achieving excellent results on mult
ISSCC 2023
Session 13
AI / ML
Crystalline Oxide Semiconductor-based 3D Bank Memory System for Endpoint Artificial Intelligence with Multiple Neural Networks Facilitating Context Switching and Power Gating the maximum frequency. Energy for inference (MNIST) using only the CPU memory and the core is 1681.97µJ, whereas energy for inference using the ACC is 0.19µJ. The inference time is reduced from 3.55s to 485µs. Therefore, our ACC enables inference according to the frame rate of imaging data (e.g., 60fps and 16ms).
Masashi Fujita1, Munehiro Kozuma1, Yoshinori Ando1, Yoshiyuki Kurokawa1, Toru Nakura2, Shunpei Yamazaki1 The effect of power reduction when performing context switching and PG is compared between an OS/Si chip and a Si (
ISSCC 2022
Session 34
AI / ML
Side-Channel Attack Counteraction via Machine LearningTargeted Power Compensation for Post-Silicon HW Security Patching
Southern University of Science and Technology, Shenzhen, China 1 2 *Equally Credited Authors (ECAs) Counteracting side-channel attacks has become a basic requirement in secure integrated circuits handling physical or sen
ISSCC 2022
Session 33
AI / ML
DSPU: A 281.6mW Real-Time Depth Signal Processing Unit for Deep Learning-Based Dense RGB-D Data Acquisition with Depth Fusion and 3D Bounding Box Extraction in Mobile Platforms
RGBD data and 3D bounding-box (BB) information for accurate navigation and seamless interaction with the surrounding environment. Specifically, the extraction of RGB-D data and 3D BB needs to be done in real-time (> 30fp
ISSCC 2022
Session 29
AI / ML
ReckOn: A 28nm Sub-mm2 Task-Agnostic Spiking Recurrent Neural Network Processor Enabling On-Chip Learning over Second-Long Timescales
The robustness of autonomous inference-only devices deployed in the real world is limited by data distribution changes induced by different users, environments, and task requirements. This challenge calls for the develop
ISSCC 2022
Session 29
AI / ML
A 28nm 15.59µJ/Token Full-Digital Bitline-Transpose CIM-Based Sparse Transformer Accelerator with Pipeline/Parallel Reconfigurable Modes
state-of-the-art results in many fields, like natural language processing and computer vision, but their large number of matrix multiplications (MM) result in substantial data movement and computation, causing high laten
ISSCC 2022
Session 29
AI / ML
A 28nm 27.5TOPS/W Approximate-Computing-Based Transformer Processor with Asymptotic Sparsity Speculating and Out-of-Order Computing
code generator performs cascaded OR (AND) for positive (negative) data via 6b MSBs to generate a 6b signal with more 0-bits for small values to disable compressors. For exact mode, it modifies 2 BVs to “0” to make approx
ISSCC 2022
Session 22
AI / ML
An 82nW 0.53pJ/SOP Clock-Free Spiking Neural Network with 40µs Latency for AIoT Wake-Up Functions Using Ultimate-Event-Driven Bionic Architecture and Computing-in-Memory Technique
Peiyu Chen1, Meng Wu1, Hao Zhang1, Peng Zhou3, Jinguang Liu3, Guangyu Sun1, Jiayoon Ru1, Le Ye1,2, Ru Huang1 Peking University, Beijing, China Advanced Institute of Information Technology of Peking University, Hangzhou,
ISSCC 2022
Session 22
AI / ML
A 108nW 0.8mm2 Analog Voice Activity Detector (VAD) Featuring a Time-Domain CNN as a Programmable Feature Extractor and a Sparsity-Aware Computational Scheme in 28nm CMOS
University of Lisboa, Lisbon, Portugal 1 2 An ultra-low-power always-on voice activity detector (VAD) is the key enabler of acoustic sensing in wearables. The VAD listens to the environment and wakes up the main system o
ISSCC 2022
Session 19
AI / ML
A 28GHz Compact 3-Way Transformer-Based Parallel-Series Doherty Power Amplifier with 20.4%/14.2% PAE at 6-/12-dB Power Back-Off and 25.5dBm PSAT in 55nm Bulk CMOS
University of Electronic Science and Technology of China, Chengdu, China 1 2 The 28GHz band for fifth-generation (5G) millimeter-wave wireless communication supporting high-order QAM, OFDM, and carrier aggregation (CA) r
ISSCC 2022
Session 16
AI / ML
A 40nm 64kb 26.56TOPS/W 2.37Mb/mm2 RRAM Binary/Compute-in-Memory Macro with 4.23× Improvement in Density and >75% Use of Sensing Dynamic Range
Compute-in-Memory (CIM) using emerging nonvolatile (eNVM) memory technologies, such as resistive random-access memory (RRAM), has been shown by several implemented macros to be an energy-efficient alternative to traditio
ISSCC 2022
Session 16
AI / ML
DIMC: 2219TOPS/W 2569F2/b Digital In-Memory Computing Macro in 28nm Based on Approximate Arithmetic Hardware
Ram K. Krishnamurthy2, Mingoo Seok1 Columbia University, New York, NY Intel, Portland, OR 1 2 In-memory-computing (IMC) SRAM architecture has gained significant attention as it achieves high energy efficiency for computi
ISSCC 2022
Session 15
AI / ML
A 0.8V Intelligent Vision Sensor with Tiny Convolutional Neural Network and Programmable Weights Using Mixed-Mode Processing-in-Sensor Technique for Image Classification
artificial intelligence (AI) for applications requiring image classification are in growing demand. However, the imager plus dedicated AI accelerator solution [1] suffers from the burdens of power and latency caused by t
ISSCC 2022
Session 15
AI / ML
ARCHON: A 332.7TOPS/W 5b Variation-Tolerant Analog CNN Processor Featuring Analog Neuronal Computation Unit and Analog Memory
One of the notable trends in convolutional neural network (CNN) processor architecture is to embrace analog hardware to improve energy efficiency in performing multiply-andaccumulate (MAC). Prior works investigated charg
ISSCC 2022
Session 15
AI / ML
DIANA: An End-to-End Energy-Efficient DIgital and ANAlog Hybrid Neural Network SoC
Giuseppe M. Sarda1,2, Vikram Jain1, Man Shi1, Qilin Zheng1, Sebastian Giraldo1, Peter Vrancx2, Jonas Doevenspeck2, Debjyoti Bhattacharjee2, Stefan Cosemans2, Arindam Mallik2, Peter Debacker2, Diederik Verkest2, Marian Ve
ISSCC 2022
Session 15
AI / ML
A 28nm 29.2TFLOPS/W BF16 and 36.5TOPS/W INT8 Reconfigurable Digital CIM Processor with Unified FP/INT Pipeline and Bitwise In-Memory Booth Multiplication for Cloud Deep Learning Acceleration
have been proposed for edge deep learning (DL) acceleration. They usually rely on analog CIM techniques to achieve highefficiency NN inference with low-precision INT multiply-accumulation (MAC) support
ISSCC 2022
Session 15
AI / ML
COMB-MCM: Computing-on-Memory-Boundary NN Processor with Bipolar Bitwise Sparsity Optimization for Scalable Multi-Chiplet-Module Edge Machine Learning
Tianchan Guan2, Shengcheng Wang2, Dimin Niu2, Hongzhong Zheng2, Chixiao Chen1, Mingyu Wang1, Lihua Zhang1, Xiaoyang Zeng1, Qi Liu1, Yuan Xie2, Ming Liu1 Fudan University, Shanghai, China Alibaba DAMO Academy, Shanghai, C
ISSCC 2022
Session 15
AI / ML
A 65nm Systolic Neural CPU Processor for Combined Deep Learning and General-Purpose Computing with 95% PE
network (DNN) accelerators, few works have targeted improving the end-to-end performance of deeplearning tasks, where inter-layer pre/post-processing, data alignment and data movement across memory and processing units o
ISSCC 2022
Session 15
AI / ML
A Multi-Mode 8K-MAC HW-Utilization-Aware Neural Processing Unit with a Unified Multi-Precision Datapath in 4nm Flagship Mobile SoC
Taeho Jeon1, Yesung Kang1, Heonsoo Lee1, Dongwoo Lee1, James Kim1, YoungJong Lee1, Sangkyu Park 1, Jun-Woo Jang2, SangHyuck Ha1, MinSeong Kim1, Jihoon Bang1, Suk Hwan Lim1, Inyup Kang1 Samsung Electronics, Hwaseong, Kore
ISSCC 2022
Session 11
AI / ML
A 28nm 1Mb Time-Domain Computing-in-Memory 6T-SRAM Macro with a 6.6ns Latency, 1241GOPS and 37.01TOPS/W for 8b-MAC Operations for Edge-AI Devices
Jin-Sheng Ren1, Fu-Chun Chang1, Yuan Wu1, Ho-Yu Chen1, Chen-Hsun Lin1, Hsu-Ming Hsiao2, Sih-Han Li2, Shyh-Shyuan Sheu2, Shih-Chieh Chang2, Wei-Chung Lo2, Chung-Chuan Lo1, Ren-Shuo Liu1, Chih-Cheng Hsieh1, Kea-Tiong Tang1
ISSCC 2022
Session 11
AI / ML
A 1.041-Mb/mm2 27.38-TOPS/W Signed-INT8 Dynamic-LogicBased ADC-less SRAM Compute-In-Memory Macro in 28nm with Reconfigurable Bitwise Operation for AI and Embedded Applications
China 4 Duke University, Durham, NC 1 2 Advanced intelligent embedded systems perform cognitive tasks with highly-efficient vector-processing units for deep neural network (DNN) inference and other vector-based signal pr
ISSCC 2022
Session 11
AI / ML
A 5-nm 254-TOPS/W 221-TOPS/mm2 Fully-Digital Computingin-Memory Macro Supporting Wide-Range Dynamic-VoltageFrequency Scaling and Simultaneous MAC and Write Operations
Rawan Naous2, Chao-Kai Chuang1, Takeshi Hashizume3, Dar Sun1, Chia-Fu Lee1, Kerem Akarvardar2, Saman Adham4, Tan-Li Chou1, Mahmut Ersin Sinangil2, Yih Wang1, Yu-Der Chih1, Yen-Huei Chen1, Hung-Jen Liao1, Tsung-Yung Jonat
ISSCC 2022
Session 11
AI / ML
Single-Mode CMOS 6T-SRAM Macros with Keeper-LoadingFree Peripherals and Row-Separate Dynamic Body Bias Achieving 2.53fW/bit Leakage for AIoT Sensing Platforms
Advanced Institute of Information Technology of Peking University, Hangzhou, China 1 2 Miniaturized wireless IoT sensor nodes stay mostly in their standby mode and wake up periodically to sense and store a small amount o
ISSCC 2022
Session 11
AI / ML
An 8-Mb DC-Current-Free Binary-to-8b Precision ReRAM Nonvolatile Computing-in-Memory Macro using Time-SpaceReadout with 1286.4 - 21.6TOPS/W for Edge-AI Devices
Tai-Hao Wen1, Chin-I Su2, Win-San Khwa2, Chung-Chuan Lo1, Ren-Shuo Liu1, Chih-Cheng Hsieh1, Kea-Tiong Tang1, Yu-Der Chih2, Tsung-Yung Jonathan Chang2, Meng-Fan Chang1,2 National Tsing Hua University, Hsinchu, Taiwan TSMC
ISSCC 2022
Session 11
AI / ML
A 22nm 4Mb STT-MRAM Data-Encrypted Near-Memory Computation Macro with a 192GB/s Read-and-Decryption Bandwidth and 25.1-55.1TOPS/W 8b MAC for AI Operations
Fu-Chun Chang1, Yuan Wu1, Yu-An Chien1, Fang-Ling Hsieh1, Chung-Yuan Li1, Guan-Yi Lin1, Po-Jung Chen1, Tsen-Hsiang Pan1, Chung-Chuan Lo1, Win-San Khwa2, Ren-Shuo Liu1, Chih-Cheng Hsieh1, Kea-Tiong Tang1, Chieh-Pu Lo2, Yu
ISSCC 2021
Session 9
AI / ML
A 1/2.3inch 12.3Mpixel with On-Chip 4.97TOPS/W CNN Processor Back-Illuminated Stacked CMOS Image Sensor
Hareesh Gowtham2, Hidetomo Nakanishi2, Edan Almog3, Yoel Livne3, Gadi Yuval3, Eli Zyss3, Takashi Izawa2 Sony Semiconductor Solutions, Tokyo, Japan Sony Semiconductor Solutions, Atsugi, Japan 3 Sony Semiconductor Israel,
ISSCC 2021
Session 9
AI / ML
A 6K-MAC Feature-Map-Sparsity-Aware Neural Processing Unit in 5nm Flagship Mobile SoC
Hanwoong Jung2, Seungwon Lee2, Suknam Kwon1, Kyungah Jeong1, Joon-Ho Song2, SukHwan Lim1, Inyup Kang1 Samsung Electronics, Hwaseong, Korea Samsung Advanced Institute of Technology, Suwon, Korea 1 2 On-device machine lear
ISSCC 2021
Session 9
AI / ML
A 40nm 4.81TFLOPS/W 8b Floating-Point Training Processor for Non-Sparse Neural Networks Using Shared Exponent Bias and 24-Way Fused Multiply-Add Tree
*Equally Credited Authors (ECAs) Recent works on mobile deep-learning processors have presented designs that exploit sparsity [2, 3], which is commonly found in various neural networks. However, due to the shift in the m
ISSCC 2021
Session 9
AI / ML
A 28nm 12.1TOPS/W Dual-Mode CNN Processor Using Effective-Weight-Based Convolution and Error-Compensation-Based Prediction
edge devices efficiently, most existing CNN processors were built on quantized CNNs to optimize the inference operations. However, three issues (Fig. 9.2.1) have not been well addressed: 1) Duplicate weights in each kern
ISSCC 2021
Session 6
AI / ML
A 1.75dB-NF 25mW 5GHz Transformer-Based NoiseCancelling CMOS Receiver Front-End
Massachusetts Institute of Technology, Cambridge, MA 1 2 With continuous exploitation of sub-6GHz wireless communication standards and the advent of 5G and 6G, the demand for faster speed and wider coverage keeps evolvin
ISSCC 2021
Session 33
AI / ML
A 1.25W 46.5%-Peak-Efficiency Transformer-in-Package Isolated DC-DC Converter Using Glass-Based Fan-Out Wafer-Level Packaging Achieving 50mW/mm2 Power Density polyimide layers with a dielectric breakdown strength of >400V/μm are laminated among 3 RDLs to form isolation barriers, providing better than 5kV isolation rating. Consequently, the transformer achieves a coupling coefficient of 0.8, enabling over 1W power delivery.
Daquan Yu2, Ming Liu3, Lin Cheng1 Figure 33.5.3 shows the simplified schematic of the power stage. In the Tx, an LC tank oscillator with an AC-coupled structure is adopted. To handle a wide supply voltage (VDD) range of
ISSCC 2021
Session 26
AI / ML
A Watt-Level Quadrature Switched/Floated-Capacitor Power Amplifier with Back-Off Efficiency Enhancement in Complex Domain Using Reconfigurable Self-Coupling Canceling Transformer
Modern wireless communication systems in portable devices require transmitters (TXs) with watt-level output power (Pout), high data-rate, and high efficiency, especially at power back-off (PBO) for enhanced average effic
ISSCC 2021
Session 23
AI / ML
270-to-300GHz Double-Balanced Parametric Upconverter Using Asymmetric MOS Varactors and a Power-SplittingTransformer Hybrid in 65nm CMOS
Oklahoma State University, Stillwater, OK 1 2 Wireless communication at ~300GHz is drawing attention due to its potential to support a high data-rate using the wide available bandwidth. Transmitters operating at ~300GHz
ISSCC 2021
Session 20
AI / ML
A 60GHz 186.5dBc/Hz FoM Quad-Core Fundamental VCO Using Circular Triple-Coupled Transformer with No Mode Ambiguity in 65nm CMOS
The recent development of the 5th-generation (5G) communication sytems has set increasingly strict requirements on the spectral purity of millimeter-wave (mm-wave) local oscillators (LO). Low phase noise is crucial to en