Hybrid Neural Networks for On-Device Directional Hearing
Authors: Anran Wang, Maruchi Kim, Hao Zhang, Shyamnath Gollakota11421-11430
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our evaluation shows comparable performance to stateof-the-art causal inference models on synthetic data while achieving a 5x reduction of model size, 4x reduction of computation per second, 5x reduction in processing time and generalizing better to real hardware data. Further, our real-time hybrid model runs in 8 ms on mobile CPUs designed for lowpower wearable devices and achieves an end-to-end latency of 17.5 ms. |
| Researcher Affiliation | Academia | Anran Wang1, Maruchi Kim1, Hao Zhang2, Shyamnath Gollakota1 1University of Washington 2ETH Z urich |
| Pseudocode | No | The paper includes architectural diagrams in Figure 2 and describes the network process in text, but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement of code release or a link to a repository for the methodology described in the paper. |
| Open Datasets | Yes | To gather a large amount of training data, we use software to simulate random reverberate noisy rooms using the image source model (Scheibler, Bezzam, and Dokmani c 2018). ...playing random speech utterances from the VCTK dataset (Veaux et al. 2017), meanwhile simulating diffused noise from the MS-SNSD dataset (Reddy et al. 2019) and WHAM! dataset (Wichern et al. 2019). |
| Dataset Splits | Yes | We generate a total of 8000 clips as training set, 400 clips as validation set, and 200 clips as test set. |
| Hardware Specification | Yes | We deploy the models on two mobile development boards to measure the processing latency: a Raspberry Pi 4B with a four-core Cortex A-72 CPU and a four-core low-power Cortex A-55 developement board which support FP16 operations, both running at 2 GHz. |
| Software Dependencies | No | The paper mentions using PyTorch, TensorFlow, MNN from Alibaba, Arm NN, and Pulse Audio but does not specify their version numbers. |
| Experiment Setup | Yes | The encoder and decoder both have a kernel size of 32 and a stride of 8. The rest of the hyperparameters are listed in Table 1. ... We use a 1:10 linear combination of scale-invariant signal-to-distortion ratio (SI-SDR) (Le Roux et al. 2019) and mean L1 loss as training subjective... |