ISSCC 2021
Session 16
AI / ML
eDRAM-CIM: Compute-In-Memory Design with Reconfigurable Embedded-Dynamic-Memory Array Realizing Adaptive Data Converters and Charge-Domain Computing
has led to massive amounts of data movement from off-chip memory to on-chip processing cores in modern machine learning (ML) accelerators. Compute-in-memory (CIM) designs performing analog DNN computations within a memor
ISSCC 2021
Session 15
AI / ML
A 65nm 3T Dynamic Analog RAM-Based Computing-inMemory Macro and CNN Accelerator with Retention
computing inside memory macros have shown significant advantages in computing efficiency for deep learning applications. While earlier CIM macros were limited by lower bit precision, e.g. binary weights in [1], recent wo
ISSCC 2021
Session 15
AI / ML
A Programmable Neural-Network Inference Accelerator Based on Scalable In-Memory Computing operations occur in the dedicated NMC BPBS SIMD module, which is optimized for 1-to-8b weights/activations, and further programmable element-wise operations (e.g., arbitrary activations functions) occur in the NMC CMPT SIMD module.
Jinseok Lee, Naveen Verma Figure 15.1.3 shows a sample of the operations enabled by CIMU configurability and the SW instruction libraries. In addition to temporal mapping of NN layers, the architecture provides extensive
ISSCC 2021
Session 12
AI / ML
A 148nW General-Purpose Event-Driven Intelligent Wake-Up Chip for AIoT Devices Using Asynchronous Spike-Based Feature Extractor and Convolutional Neural Network
University, Hangzhou, China 4 XINYI Information Technology, Shanghai, China 1 2 Power is a major bottleneck in AIoT devices, which usually operate in random-sparseevent (RSE) scenarios [1] (Fig. 12.1.1, bottom). To proce
ISSCC 2020
Session 7
AI / ML
GANPU: A 135TFLOPS/W Multi-DNN Training Processor for GANs with Speculative Dual-Sparsity Exploitation
image style transfer to synthetic voice generation [1]. GAN applications on mobile devices, such as face-to-Emoji conversion and super-resolution imaging, enable more engaging user interaction. As shown in Fig. 7.4.1, a
ISSCC 2020
Session 7
AI / ML
STATICA: A 512-Spin 0.25M-Weight Full-Digital Annealing Processor with a Near-Memory All-SpinUpdates-at-Once Architecture for Combinatorial Optimization with Complete Spin-Spin Interactions
Masanao Yamaoka3, Hiroshi Teramoto2, Akira Sakai2, Shinya Takamaeda-Yamazaki4, Masato Motomura1 Tokyo Institute of Technology, Yokohama, Japan Hokkaido University, Sapporo, Japan, 3Hitachi, Sapporo, Japan 4 University of
ISSCC 2020
Session 7
AI / ML
A 12nm Programmable Convolution-Efficient Neural-Processing-Unit Chip Achieving 825TOPS
Yun Li1, Long Chen1, Zhen Chen1, Lu Liu3, Zhuyu He3, Yu Yan3, Jun He3, Jun Mao3, Xiaotao Zai3, Xuejun Wu3, Yongquan Zhou3, Mingqiu Gu1, Guocai Zhu1, Rong Zhong1, Wenyuan Lee1, Ping Chen1, Yiping Chen1, Weiliang Li3, Deyu
ISSCC 2020
Session 7
AI / ML
A 3.4-to-13.3TOPS/W 3.6TOPS Dual-Core Deep-Learning Accelerator for Versatile AI Applications in 7nm 5G Smartphone SoC
Yu-Ting Kuo, Perry H Wang, Pei-Kuei Tsung, Jeng-Yun Hsu, Wei-Chih Lai, Chia-Hung Liu, Shao-Yu Wang, Chin-Hua Kuo, Chih-Yu Chang, Ming-Hsien Lee, Tsung-Yao Lin, Chih-Cheng Chen MediaTek, Hsinchu, Taiwan Recent advancement
ISSCC 2020
Session 34
AI / ML
1225-Channel Localized Temperature-Regulated Neuromorphic Retinal-Prosthesis SoC with 56.3nW/Channel Image Processor
N.1 Institute for Health, Singapore, Singapore 1 2 Retinal prosthesis (RP) electrically stimulates the retinal cells of the blind to restore visual loss, and it is essential to increase the number of channels for higher
ISSCC 2020
Session 33
AI / ML
A Fully Integrated Analog ReRAM Based 78.4TOPS/W Compute-In-Memory Chip with Fully Parallel MAC Computing cell is signed quasi-2-bit (3-level) or signed quasi-3-bit (7-level) accordingly.
on-chip ReRAM conductance could be quantified with 256 states at most. Qi Liu1, Bin Gao1, Peng Yao1, Dong Wu1, Junren Chen1, Yachuan Pang1, Wenqiang Zhang1, Yan Liao1, Cheng-Xin Xue2, Wei-Hao Chen2, Jianshi Tang1, Yu Wan
ISSCC 2020
Session 31
AI / ML
CIM-Spin: A 0.5-to-1.2V Scalable Annealing Processor Using Digital Compute-In-Memory Spin Operators and Register-Based Spins for Combinatorial Optimization Problems
*Equally-Credited Authors (ECAs) Annealing processors [1-3] based on the convergence property of the Ising model offer an attractive means for solving combinatorial optimization problems [4]. A recently developed anneali
ISSCC 2020
Session 26
AI / ML
A Neuromorphic Multiplier-Less Bit-Serial WeightMemory-Optimized 1024-Tree Brain-State Classifier and Neuromodulation SoC with an 8-Channel Noise-Shaping SAR ADC Array
Camilo Tejeiro1, Maged ElAnsary1, Chenxi Tang1, Homeira Moradi2, Prajay Shah1, Taufik A. Valiante3, Roman Genov1 University of Toronto, Toronto, Canada Krembil Neuroscience Center, Toronto, ON, Canada 3 Toronto Western H
ISSCC 2020
Session 24
AI / ML
A 15b Quadrature Digital Power Amplifier with Transformer-Based Complex-Domain Power-Efficiency Enhancement
law to provide compact die area, better interface to digital back-end, and higher power efficiency due to the faster switching nature of core devices even in face of reduced supply voltages. Moreover, the integration of
ISSCC 2020
Session 24
AI / ML
A 24-to-30GHz Watt-Level Broadband Linear Doherty Power Amplifier with Multi-Primary Distributed-ActiveTransformer Power-Combining Supporting 5G NR FR2 64-QAM with >19dBm Average Pout and >19% Average PAE
The continuous worldwide demand for multi-Gb/s data-rate has driven the rapid development and standardization of 5G New Radio (NR) specifications in the mmwave bands [1-3]. As a result, there is a surge of interest in hi
ISSCC 2020
Session 18
AI / ML
A Fully-Generic-Process Galvanic Isolator for Gate Driver with 123mW 23% Power Transfer and Full-Triplex 21/14/0.5Mb/s Bidirectional Communication Utilizing Reference-Free Dual-Modulation FSK DATA2 is transferred in the same manner as DATA1 through another transformer. Almost all the circuits excluding the DATA1 driver and I/O buffer operate at 1.5V supply to support sufficiently high operating speed. The driver operates with 5.5V supply to transfer power by the same transformer, and the rectifier placed in parallel with the DATA1 receiver extracts the received power.
Thanks to the VCO synchronization, demodulation can be realized by a delaybased digital frequency counting normalized by the oscillation period (tVCO). But there is a challenge in that the received signal edge is not per
ISSCC 2020
Session 15
AI / ML
A 28nm 64Kb 6T SRAM Computing-in-Memory Macro with 8b MAC Operation for AI Edge Chips multiplication results (PL"="IN[1:0]"⋅ W) to HGBLB.
Jing-Hong Wang1, Ta-Wei Liu1, Ssu-Yen Wu1, Ruhui Liu1, Yen-Chi Chou1, Zhixiao Zhang1, Syuan-Hao Sie1, Wei-Chen Wei1, Yun-Chen Lo1, Tai-Hsing Wen1, Tzu-Hsiang Hsu1, Yen-Kai Chen1, William Shih1, Chung-Chuan Lo1, Ren-Shuo
ISSCC 2020
Session 15
AI / ML
A 22nm 2Mb ReRAM Compute-in-Memory Macro with 121-28TOPS/W for Multibit MAC Computing for Tiny AI Edge Devices
Hui-Yao Kao, Jing-Hong Wang, Ta-Wei Liu, Shih-Ying Wei, Sheng-Po Huang, Wei-Chen Wei, Yi-Ren Chen, Tzu-Hsiang Hsu, Yen-Kai Chen, Yun-Chen Lo, Tai-Hsing Wen, Chung-Chuan Lo, Ren-Shuo Liu, Chih-Cheng Hsieh, Kea-Tiong Tang,
ISSCC 2020
Session 15
AI / ML
A 351TOPS/W and 372.4GOPS Compute-in-Memory SRAM Macro in 7nm FinFET CMOS for Machine-Learning Applications
computations and reduces off-chip weight access to reduce energy consumption and latency, specifically for AI edge devices. Prior CIM approaches demonstrated tradeoffs for area, noise margin, process variation and weight
ISSCC 2020
Session 15
AI / ML
A 28nm 64Kb Inference-Training Two-Way Transpose Multibit 6T SRAM Compute-in-Memory Macro for AI Edge Chips
Wei-Hsing Huang1, Yung-Ning Tu1, Ruhui Liu1, Pei-Jung Lu1, Ta-Wei Liu1, Jing-Hong Wang1, Zhixiao Zhang1, Hongwu Jiang3, Shanshi Huang3, Chung-Chuan Lo1, Ren-Shuo Liu1, Chih-Cheng Hsieh1, Kea-Tiong Tang1, Shyh-Shyuan Sheu
ISSCC 2020
Session 15
AI / ML
A 5nm 135Mb SRAM in EUV and High-Mobility-Channel FinFET Technology with Metal Coupling and ChargeSharing Write-Assist Circuitry Schemes for High-Density and Low-VMIN Applications
Po-Sheng Wang, Yangsyu Lin, Hidehiro Fujiwara, Robin Lee, Hung-Jen Liao, Ping-Wei Wang, Geoffrey Yeap, Quincy Li TSMC, Hsinchu, Taiwan Despite recent advances, low-voltage operation remains one of the key approaches for
ISSCC 2020
Session 14
AI / ML
A 65nm Computing-in-Memory-Based CNN Processor with 2.9-to-35.8TOPS/W System Energy Efficiency Using Dynamic-Sparsity Performance-Scaling Architecture and Energy-Efficient Inter/Intra-Macro Data Reuse
University, Hsinchu, Taiwan 1 2 Computing-in-Memory (CIM) is a promising solution for energy-efficient neural network (NN) processors. Previous CIM chips [1-4] mainly focus on the memory macro itself, lacking insight on
ISSCC 2020
Session 14
AI / ML
A 65nm 24.7µJ/Frame 12.3mW Activation-SimilarityAware Convolutional Neural Network Video Processor
Zhe Yuan1,2, Yixiong Yang1, Jinshan Yue1,2, Ruoyang Liu1, Xiaoyu Feng1, Zhiting Lin3, Xiulong Wu3, Xueqing Li1, Huazhong Yang1, Yongpan Liu1 Tsinghua University, Beijing, China Pi2star Technology, Beijing, China 3 Anhui
ISSCC 2020
Session 14
AI / ML
A 510nW 0.41V Low-Memory Low-Computation Keyword-Spotting Chip Using Serial FFT-Based MFCC and Binarized Depthwise Separable Convolutional Neural Network in 28nm CMOS
is a strong requirement for always-on speech interfaces in wearable and mobile devices, such as Voice Activity Detection (VAD) and Keyword Spotting (KWS) [1-5]. A KWS system is used to detect specific wake-up words by sp
ISSCC 2020
Session 1
AI / ML
The Deep Learning Revolution and Its Implications for Computer Architecture and Chip Design Jeffrey Dean
The past decade has seen a remarkable series of advances in machine learning, and in particular deeplearning approaches based on artificial neural networks, to improve our abilities to build more accurate systems across
ISSCC 2019
Session 7
AI / ML
LNPU: A 25.3TFLOPS/W Sparse Deep-Neural-Network Learning Processor with Fine-Grained Mixed Precision of FP8-FP16
for energy-efficient deep learning (DL) acceleration [1-6]. Most prior DNN inference accelerators are trained in the cloud using public datasets; parameters are then downloaded to implement AI [1-5]. However, local DNN le
ISSCC 2019
Session 7
AI / ML
A 65nm 236.5nJ/Classification Neuromorphic Processor with 7.5% Energy Overhead On-Chip Learning Using Direct Spike-Only Feedback
classifying handwritten digits, the learning rule can be directly adopted in general fully-connected networks with different network structures and hence can be extended to other applications. For example, the algorithm
ISSCC 2019
Session 7
AI / ML
A 65nm 0.39-to-140.3TOPS/W 1-to-12b Unified NeuralNetwork Processor Using Block-Circulant-Enabled Transpose-Domain Acceleration with 8.1× Higher TOPS/mm2 and 6T HBST-TRAM-Based 2D Data-Reuse Architecture
Yung-Ning Tu2, Yi-Ju Chen2, Ao Ren3, Yanzhi Wang3, Meng-Fan Chang2, Xueqing Li1, Huazhong Yang1, Yongpan Liu1 Tsinghua University, Beijing, China National Tsing Hua University, Hsinchu, Taiwan 3 Northeastern University,
ISSCC 2019
Session 7
AI / ML
A 2.1TFLOPS/W Mobile Deep RL Accelerator with Transposable PE Array and Experience Compression
but also for action control, so that an autonomous system, such as the robot, can perform human-like behaviors and operations. Unlike recognition tasks, real-time operation is important in action control, and it is too s
ISSCC 2019
Session 7
AI / ML
An 879GOPS 243mW 80fps VGA Fully Visual CNN-SLAM Processor for Wide-Range Autonomous Exploration
University of Michigan, Ann Arbor, MI Simultaneous localization and mapping (SLAM) estimates an agent’s trajectory for all six degrees of freedom (6 DoF) and constructs a 3D map of an unknown surrounding. It is a fundame
ISSCC 2019
Session 7
AI / ML
A 20.5TOPS and 217.3GOPS/mm2 Multicore SoC with DNN Accelerator and Image Signal Processor Complying with ISO26262 for Automotive Applications
Soichiro Hosoda, Fumihiko Hyuga, Akira Moriya, Ryuji Hada, Atsushi Masuda, Masato Uchiyama, Tomohiro Koizumi, Takanori Tamai, Nobuhiro Sato, Jun Tanabe, Katsuyuki Kimura, Ryusuke Murakami, Takashi Yoshikawa Toshiba Elect
ISSCC 2019
Session 7
AI / ML
An 11.5TOPS/W 1024-MAC Butterfly Structure Dual-Core Sparsity-Aware Neural Processing Unit in 8nm Flagship Mobile SoC
widely applied for image and speech recognition. Response time, connectivity, privacy and security drive applications towards mobile platforms rather than cloud. For mobile systems-on-a-chip (SoCs), energyefficient neural
ISSCC 2019
Session 4
AI / ML
A 60GHz CMOS Power Amplifier with Cascaded Asymmetric Distributed-Active-Transformer Achieving Watt-Level Peak Output Power with 20.8% PAE and Supporting 2Gsym/s 64-QAM Modulation
Since its invention in early 2000s, Distributed Active Transformer (DAT) has been a popular power-combiner technique for high-power high-efficiency Power Amplifiers (PAs) in voltage-limited Silicon processes at GHz frequen
ISSCC 2019
Session 4
AI / ML
A Highly Linear High-Power 802.11ac/ax WLAN SiGe HBT Power Amplifier Using a Compact 2nd-Harmonic-Shorting Four-Way Transformer and Integrated Thermal Sensors proposed four-way transformer achieves simultaneous fundamental and 2ndharmonic impedance matching with the efficient parallel power combining capability.
Mark Doherty2, John D. Cressler1 The electro-thermal behavior of the output stage is described in Fig. 4.4.3. The thermal sensor QREF is located between two SiGe HBT arrays QP/QN in a symmetrical fashion to sense and tra
ISSCC 2019
Session 4
AI / ML
A Broadband Switched-Transformer Digital Power Amplifier for Deep Back-Off Efficiency Enhancement
Sophisticated OFDM modulation schemes with high spectrum efficiency and data throughput in modern wireless communication systems often result in a large peak-to-average power ratio (PAPR). Besides, wireless standards like
ISSCC 2019
Session 28
AI / ML
A 606µW mm-Scale Bluetooth Low-Energy Transmitter Using Co-Designed 3.5×3.5mm2 Loop Antenna and Transformer-Boost Power Oscillator
Wireless communication has been a limiting factor for achieving millimeter-sized wireless sensor nodes because of the high power consumption, large antenna size and off-chip components typically required. Several mm-scal
ISSCC 2019
Session 26
AI / ML
A 0.1-to-0.2V Transformer-Based Switched-Mode Folded DCO in 22nm FDSOI with Active Step-Down Impedance Achieving 197dBc/Hz Peak FoM and 40MHz/V Frequency Pushing
The continuous improvement in ultra-low-power/voltage (ULP/V) circuits paves the way for new near-battery-free IoE applications. In this work, we propose an LC oscillator that can significantly improve phase-noise (PN) pe
ISSCC 2019
Session 24
AI / ML
A Twin-8T SRAM Computation-In-Memory Macro for Multiple-Bit CNN-Based Machine Learning
Jing-Hong Wang1, Yen-Cheng Chiu1, Wei-Chen Wei1, Ssu-Yen Wu1, Xiaoyu Sun3, Rui Liu3, Shimeng Yu4, Ren-Shuo Liu1, Chih-Cheng Hsieh1, Kea-Tiong Tang1, Qiang Li2, Meng-Fan Chang1 National Tsing Hua University, Hsinchu, Taiw
ISSCC 2019
Session 24
AI / ML
A 1Mb Multibit ReRAM Computing-In-Memory Macro with 14.6ns Parallel MAC Computing Time for CNNBased AI Edge Processors
Wei-En Lin, Jing-Hong Wang, Wei-Chen Wei, Ting-Wei Chang, Tung-Cheng Chang, Tsung-Yuan Huang, Hui-Yao Kao, Shih-Ying Wei, Yen-Cheng Chiu, Chun-Ying Lee, Chung-Chuan Lo, Ya-Chin King, Chorng-Jung Lin, Ren-Shuo Liu, Chih-C
ISSCC 2019
Session 2
AI / ML
A 2×30k-Spin Multichip Scalable Annealing Processor Based on a Processing-In-Memory Approach for Solving Large-Scale Combinatorial Optimization Problems
computer architecture, commonly known as annealing processor [1, 2]. An annealing processor provides a fast means for finding the ground state of an Ising model; thus, it can efficiently solve NP-hard combinatorial optimiz
ISSCC 2019
Session 2
AI / ML
A 40×40 Four-Neighbor Time-Based In-Memory Computing Graph ASIC Chip Featuring Wavefront Expansion and 2D Gradient Control
Single-source shortest path (SSP) problems have a rich history of algorithm development [1-3]. SSP has many applications including AI decision making, robot navigation, VLSI signal routing, autonomous vehicles and many o
ISSCC 2019
Session 15
AI / ML
A 52% Peak-Efficiency >1W Isolated Power Transfer System Using Fully Integrated Magnetic-Core Transformer
wide field of applications to guarantee human safety or better reliability in harsh industrial environments. The transfer of both data and power across an isolation barrier is often an essential requisite in these applica
ISSCC 2019
Session 14
AI / ML
A Modular Hybrid LDO with Fast Load-Transient Response and Programmable PSRR in 14nm CMOS Featuring Dynamic Clamp Tuning and Time-Constant Compensation
Khondker Z. Ahmed, Krishnan Ravichandran, James Tschanz, Vivek De Intel, Hillsboro, OR Complex SoCs in scaled CMOS processes integrate a large variety of digital, SRAM and noise-sensitive mixed-signal/analog circuit bloc
ISSCC 2019
Session 14
AI / ML
A 745pA Hybrid Asynchronous Binary-Searching and Synchronous Linear-Searching Digital LDO with 3.8×105
Shuo Li, Benton H. Calhoun University of Virginia, Charlottesville, VA Voltage regulators for emerging nW-to-μW Internet-of-Things (IoT) systems-onchip (SoCs) require ultra-low quiescent power to enhance lifetime, a larg
ISSCC 2019
Session 14
AI / ML
A 0.6-to-1.1V Computationally Regulated Digital LDO with 2.79-Cycle Mean Settling Time and Autonomous Runtime Gain Tracking in 65nm CMOS
University of Washington, Seattle, WA Low-Dropout Regulators (LDOs) play an important role in enabling fine-grained supply-voltage domains for energy-efficient SoC design [1]. Digital LDOs are of particular interest due to
ISSCC 2019
Session 14
AI / ML
All-Digital Time-Domain CNN Engine Using Bidirectional Memory Delay Lines for Energy-Efficient Edge Computing
Convolutional Neural Networks (CNN) provide superior classification accuracy in a variety of machine learning applications, such as image/speech/sensor data processing. However, CNNs require intensive compute and memory r
ISSCC 2019
Session 14
AI / ML
A 43pJ/Cycle Non-Volatile Microcontroller with 4.7µs Shutdown/Wake-up Integrating 2.3-bit/Cell Resistive RAM and Resilience Techniques
William Hwang1, Seungbin Jeong1, Haitong Li1, Pulkit Tandon1, Elisa Vianello2, Pascal Vivet2, Etienne Nowak2, Mary K. Wootters1, H.-S. Philip Wong1, Mohamed M. Sabry Aly3, Edith Beigne2, Subhasish Mitra1 Stanford Univers
ISSCC 2019
Session 14
AI / ML
A Compute SRAM with Bit-Serial Integer/Floating-Point Operations for Programmable In-Memory Vector Acceleration
factors in the energy and performance of both general purpose CPUs and GPUs. This has led to extensive research focused on in-memory computing, which moves computation to where the data is located. With this approach, co
ISSCC 2019
Session 14
AI / ML
A 65nm 1.1-to-9.1TOPS/W Hybrid-Digital-Mixed-Signal Computing Platform for Accelerating Model-Based and Model-Free Swarm Robotics
Artificial swarm intelligence, inspired by biological studies of insects, ants and other organisms, present an emerging computing paradigm, where seemingly simple elements interact with each other to collectively solve ch
ISSCC 2018
Session 7
AI / ML
A 55nm Time-Domain Mixed-Signal Neuromorphic Accelerator with Stochastic Synapses and Embedded Reinforcement Learning for Autonomous Micro-Robots
networks (DNNs) and convolutional neural networks (CNNs) with most hardware demonstrations geared towards inference in vision-based platforms [1-5], we recognize that true autonomy in intelligent agents will only emerge
ISSCC 2018
Session 31
AI / ML
A 65nm 4Kb Algorithm-Dependent Computing-inMemory SRAM Unit-Macro with 2.3ns and 55.8TOPS/W Fully Parallel Product-Sum Operation for Binary DNN Edge Processors
Technology of China, Sichuan, China 4 Arizona State University, Tempe, AZ 1 3 For deep-neural-network (DNN) processors [1-4], the product-sum (PS) operation predominates the computational workload for both convolution (C