Hybrid Neural Networks for On-Device Directional Hearing

Authors: Anran Wang, Maruchi Kim, Hao Zhang, Shyamnath Gollakota11421-11430

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our evaluation shows comparable performance to stateof-the-art causal inference models on synthetic data while achieving a 5x reduction of model size, 4x reduction of computation per second, 5x reduction in processing time and generalizing better to real hardware data. Further, our real-time hybrid model runs in 8 ms on mobile CPUs designed for lowpower wearable devices and achieves an end-to-end latency of 17.5 ms.
Researcher Affiliation Academia Anran Wang1, Maruchi Kim1, Hao Zhang2, Shyamnath Gollakota1 1University of Washington 2ETH Z urich
Pseudocode No The paper includes architectural diagrams in Figure 2 and describes the network process in text, but does not provide structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement of code release or a link to a repository for the methodology described in the paper.
Open Datasets Yes To gather a large amount of training data, we use software to simulate random reverberate noisy rooms using the image source model (Scheibler, Bezzam, and Dokmani c 2018). ...playing random speech utterances from the VCTK dataset (Veaux et al. 2017), meanwhile simulating diffused noise from the MS-SNSD dataset (Reddy et al. 2019) and WHAM! dataset (Wichern et al. 2019).
Dataset Splits Yes We generate a total of 8000 clips as training set, 400 clips as validation set, and 200 clips as test set.
Hardware Specification Yes We deploy the models on two mobile development boards to measure the processing latency: a Raspberry Pi 4B with a four-core Cortex A-72 CPU and a four-core low-power Cortex A-55 developement board which support FP16 operations, both running at 2 GHz.
Software Dependencies No The paper mentions using PyTorch, TensorFlow, MNN from Alibaba, Arm NN, and Pulse Audio but does not specify their version numbers.
Experiment Setup Yes The encoder and decoder both have a kernel size of 32 and a stride of 8. The rest of the hyperparameters are listed in Table 1. ... We use a 1:10 linear combination of scale-invariant signal-to-distortion ratio (SI-SDR) (Le Roux et al. 2019) and mean L1 loss as training subjective...