ISSCC 2015Session 4 · PROCESSORSAI / ML

A 1.93TOPS/W Scalable Deep Learning/Inference Processor with Tetra-Parallel MIMD Architecture for Big-Data Applications

⚡ 本页包含 AI 生成的分析内容，仅供参考

📋 论文概要

该论文提出了一种面向大数据应用的可扩展深度学习/推理处理器，采用四并行MIMD架构，实现了1.93TOPS/W的高能效。针对深度学习训练中大规模迭代权重更新带来的计算与带宽瓶颈，该处理器通过并行架构和存储优化显著提升了性能。

💡 主要创新点

提出四并行MIMD架构，支持多种深度学习模型的高效并行处理。
通过可扩展设计适应不同规模的大数据应用，实现高能效比（1.93TOPS/W）。
针对无监督学习的受限玻尔兹曼机训练，优化了计算与数据带宽利用率。

核心指标

1.93TOPS/W

重要性

高

发表年份

ISSCC 2015

🏷 关键词

深度学习推理处理器MIMD架构高能效大数据应用

📄 原文摘要

analysis in image retrieval with high accuracy [1]. As Fig. 4.6.1 shows, various applications, such as text, 2D image and motion recognition use DL due to its best-in-class recognition accuracy. There are 2 types of DL: supervised DL with labeled data and unsupervised DL with unlabeled data. With unsupervised DL, most of learning time is spent in massively iterative weight updates for a restricted Boltzmann machine [2]. For a ~100MB training dataset, >100 TOP computational capability and ~40GB/s IO and SRAM data bandwidth is required. So, a 3.4GHz CPU needs >10 hours learning time with a ~100K input-vector dataset and takes ~1 second for recognition, which is far from real-time processing. Thus, DL is typically done using cloud servers or high-performance GPU environments with learning-on-server capability. However, the wide use of

👥 作者与机构

Seongwook Park, Kyeongryeol Bong, Dongjoo Shin, Jinmook Lee,

Sungpill Choi, Hoi-Jun Yoo KAIST, Daejeon, Korea Recently, deep learning (DL) has become a popular approach for big-data

分类：AI / ML · 年份：ISSCC 2015