D4AM: A General Denoising Framework for Downstream Acoustic Models

Authors: Chi-Chang Lee, Yu Tsao, Hsin-Min Wang, Chu-Song Chen

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The experimental results show that D4AM can consistently and effectively provide improvements to various unseen acoustic models and outperforms other combination setups.
Researcher Affiliation Academia 1National Taiwan University, Taipei, Taiwan 2Academia Sinica, Taipei, Taiwan
Pseudocode Yes Algorithm 1 D4AM (A General Denoising Framework for Downstream Acoustic Models)
Open Source Code Yes Our code is available at https://github.com/Chang Lee0903/D4AM.
Open Datasets Yes The training datasets used in this study include: noise signals from DNS-Challenge (Reddy et al., 2020) and speech utterances from Libri Speech (Panayotov et al., 2015).
Dataset Splits Yes For CHi ME-4, we evaluated the performance on the development and test sets of the 1-st track.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, memory, or cloud instance types used for running the experiments.
Software Dependencies No The paper mentions software like the 'Speech Brain toolkit' and 'DEMUCS' and the 'Adam optimizer', but it does not specify any version numbers for these software components or libraries.
Experiment Setup Yes For pre-training, we only used Lreg to train the SE unit. At this stage, we selected Libri-360 and Libri-500 as the clean speech corpus. The SE unit was trained for 500,000 steps using the Adam optimizer with β1 = 0.9 and β2 = 0.999, learning rate 0.0002, gradient clipping value 1, and batch size 8. For fine-tuning, we used both Lreg and Lϕ cls to re-train the SE model initialized by the checkpoint selected from the pre-training stage. ... The SE unit was trained for 100,000 steps with the Adam optimizer with β1 = 0.9 and β2 = 0.999, learning rate 0.0001, gradient clipping value 1, and batch size 16.