JSSC 2024第3期Digital Circuits

EPU: An Energy-Efficient Explainable AI Accelerator With Sparsity-Free Computation and Heat Map Compression/Pruning

EPU是首个专为可解释AI设计的硬件加速器，提升系统性能并减少内存占用。

无

可解释AI硬件加速器数据压缩无稀疏计算动态调度

▸创新点1：新型数据压缩格式 - 该论文提出了一种专为可解释AI工作负载设计的数据压缩格式，显著减少了热图和中间梯度的内存占用和外部内存访问，从而提升了系统整体性能。具体实现了7.01倍的热图尺寸压缩比，大幅降低了存储需求。

▸创新点2：无稀疏计算核心 - 通过创新的无稀疏计算核心设计，论文解决了输入稀疏性处理中的控制开销问题，实现了高达9.48倍的吞吐量提升。这一电路创新有效优化了计算效率，同时保持了低功耗特性。

▸创新点3：动态工作负载调度 - 论文提出了一种动态工作负载调度机制，结合定制化片上网络，针对不同的推理和解释任务进行优化，实现了63.7%的外部内存访问减少。这一系统创新显著提升了内部数据重用率。

▸创新点4：点梯度剪枝技术(PGP) - 作为补充创新，该技术通过智能梯度剪枝进一步压缩热图数据量，与新型压缩格式协同工作，共同实现了显著的内存效率提升。这一方法创新在保持解释精度的同时优化了硬件资源利用率。

Abstract

Deep neural networks (DNNs) have recently gained significant prominence in various real-world applications such as image recognition, natural language processing, and autonomous vehicles. However, due to their black-box nature in system, the underlying mechanisms of DNNs behind the inference results remain opaque to users. In order to address this challenge, researchers have focused on developing explainable artificial intelligence (AI) algorithms. Explainable AI aims to provide a clear and human-understandable explanation of the model’s decision, thereby building more reliable systems. However, the explanation task differs from well-known inference and training processes as it involves interactions with the user. Consequently, existing inference and training accelerators face inefficiencies when processing explainable AI on edge devices. This article introduces explainable processing unit (EPU), the first hardware accelerator designed for explainable AI workloads. The EPU utilizes a novel data compression format for the output heat maps and intermediate gradients to enhance the overall system perfor- mance by reducing both memory footprint and external memory access. Its sparsity-free computing core efficiently handles the input sparsity with negligible control overhead, resulting in a throughput boost of up to 9.48 ×. It also proposes a dynamic workload scheduling with a customized ON-chip network for distinct inference and explanation tasks to maximize internal data reuse