JSSC 2022第6期RF & Wireless40nmPhased Array

An Eight-Element Freque ncy-Selective Acoustic Beamformer and Bitstream Feature Extractor Seungjong Lee , Student Member , IEEE, Taewook Kang , Student Member , IEEE

结合延迟求和与恒定指向性波束成形，提升语音识别抗噪性能。

40nm CMOS, 1.1mm², 84dB SNDR, 8kHz带宽

波束成形Σ-Δ调制器梅尔频谱语音识别抗噪

▸创新点1：方法创新 - 结合延迟求和波束成形（DAS）与恒定指向性波束成形（CDB），通过CDB限制不同麦克风配置的带宽，显著提升DAS的效率和噪声抑制能力，使关键词识别准确率从74%提升至93%。

▸创新点2：电路创新 - 采用Σ-Δ调制器（SDM）阵列直接对八路麦克风输入进行数字化，利用比特流处理技术实现低功耗（每SDM仅91mW）和高信噪比（84dB @8kHz带宽），简化了模拟前端设计。

▸创新点3：系统创新 - 在比特流域直接提取60维梅尔频谱功率特征，避免了传统ADC和数字信号处理的复杂度，同时特征提取模块功耗仅为122mW，实现了高效的端到端语音处理。

▸创新点4：工艺创新 - 基于40nm CMOS工艺实现1.1mm²的紧凑芯片面积，整体功耗仅3.95mW（含漏电），为便携式语音识别设备提供了高能效的硬件解决方案。

Abstract

Beamforming is an essential tool for speaker selection and rejection of environmental noise in automatic speech recognition. This work harnesses the efﬁciency of delay- and-sum (DAS) beamforming by combining it with constant- directivity beamforming (CDB) and frequency-domain feature extraction. CDB facilitates DAS by restricting the bandwidth for different microphone conﬁgurations. An array of sigma– delta modulators (SDMs) digitizes eight microphone inputs. The design takes advantage of bitstream processing of the modulator outputs for beamforming and extracting 60 Mel spectrum power features. The prototype device is fabricated in the 40-nm CMOS and occupies 1.1 mm 2. Each SDM consumes 91 mW and has a measured signal-to-noise and distortion ratio of 84 dB for an 8-kHz bandwidth. The beamformer and feature extractor consume a dynamic power of 76 and 122 mW, respectively. The entire power consumption of the prototype is 3.95 mW, including leakage power. Processing the Mel spectrum outputs with a DNN, the keyword spotting accuracy in the presence of noise improves from 74% without beamforming to 93% with beamforming.