← 返回 JSSC 论文列表
📄 下载 JSSC 原文 PDF
JSSC 2020第7期Memory65nmNeural Network Accelerator

An 893 TOPS-W LSTM Recurrent Neural Network Accelerator Featuring Hierarchical C

提出一种基于分层粗粒度稀疏的LSTM神经网络加速器,实现高效能语音识别。
65-nm LP CMOS, 8.93 TOPS/W
LSTM神经网络加速器分层粗粒度稀疏语音识别能效优化
创新点1:分层粗粒度稀疏(HCGS)算法硬件协同优化(方法创新)。通过算法与硬件的协同设计,实现了高效的权重压缩,减少了存储和计算需求,压缩比高达16倍,同时保持低错误率。
创新点2:块递归权重压缩技术(方法创新)。采用块级递归压缩方法,显著降低了权重存储的索引内存开销,解决了传统元素级稀疏方案的效率问题。
创新点3:高效能LSTM网络实现(系统创新)。在65-nm LP CMOS工艺下,实现了8.93 TOPS/W的能效,适用于实时语音识别任务,并在TIMIT、TED-LIUM和LibriSpeech数据集上验证了低错误率。
创新点4:硬件加速器设计(电路创新)。通过优化内存访问和计算单元,提升了LSTM网络的并行处理能力,进一步降低了能耗。
Abstract
Long short-term memory (LSTM) is a type of recurrent neural networks (RNNs), which is widely used for time-series data and speech applications, due to its high accuracy on such tasks. However, LSTMs pose difficulties for efficient hardware implementation because they require a large amount of weight storage and exhibit computation complexity. Prior works have proposed compression techniques to alleviate the storage/computation requirements of LSTMs but elementwise sparsity schemes incur sizable in