← 返回 JSSC 论文列表JSSC 2019第6期Data Converters0.18μm
Design of an Always-On Deep Neural Network-Based 1-μW V oice Activity Detector Aided With a Customized Software Model for Analog
设计了一种超低功耗语音活动检测器,采用模拟信号处理和数字深度神经网络分类,功耗仅1μW。
0.18μm CMOS, 1.66×1.52 mm², 1μW, 84.4%/85.4% 语音/非语音命中率
语音活动检测器超低功耗深度神经网络模拟信号处理事件驱动模数转换
▸模拟信号处理直接用于麦克风输出的声学特征提取
▸近似事件驱动的模数转换器(ED-ADC)
▸三隐藏层二值化多层感知机(MLP)用于语音/非语音分类
Abstract
This paper presents an ultra-low-power voice activity detector (V AD). It uses analog signal processing for acoustic feature extraction (AFE) directly on the microphone output, approximate event-driven analog-to-digital conversion (ED-ADC), and digital deep neural network (DNN) for speech/non-speech classification. New circuits, including the low-noise amplifier, bandpass filter, and full-wave rectifier con- tribute to the more than 9 × normalized power/channel reduction in the feature extraction fron t-end compared to the best prior art. The digital DNN is a three-hidden-layer binarized multilayer perceptron (MLP) with a 2-neuron output layer and a 48-neuron input layer that r eceives parallel event streams from the ED- ADCs. To obtain the DNN weights via off-line training, a cus- tomized front-end model written in python is constructed to accelerate feature generation in software emulation, and the model parameters are extracted from Spectre simulations. The chip, fabricated in 0.18- μm CMOS, has a core area of 1.66 × 1.52 mm 2 and consumes 1 μW. The classification measurements using the 1-hour 10-dB signal-to-noise ratio audio with restaurant background noise show a mean speech/non-speech hit rate of 84.4%/85.4% with a 1.88%/4.65% 1- σ variation across ten dies that are all loaded with the same weights.