D4AM: A General Denoising Framework for Downstream Acoustic Models
Authors: Chi-Chang Lee, Yu Tsao, Hsin-Min Wang, Chu-Song Chen
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental results show that D4AM can consistently and effectively provide improvements to various unseen acoustic models and outperforms other combination setups. |
| Researcher Affiliation | Academia | 1National Taiwan University, Taipei, Taiwan 2Academia Sinica, Taipei, Taiwan |
| Pseudocode | Yes | Algorithm 1 D4AM (A General Denoising Framework for Downstream Acoustic Models) |
| Open Source Code | Yes | Our code is available at https://github.com/Chang Lee0903/D4AM. |
| Open Datasets | Yes | The training datasets used in this study include: noise signals from DNS-Challenge (Reddy et al., 2020) and speech utterances from Libri Speech (Panayotov et al., 2015). |
| Dataset Splits | Yes | For CHi ME-4, we evaluated the performance on the development and test sets of the 1-st track. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, memory, or cloud instance types used for running the experiments. |
| Software Dependencies | No | The paper mentions software like the 'Speech Brain toolkit' and 'DEMUCS' and the 'Adam optimizer', but it does not specify any version numbers for these software components or libraries. |
| Experiment Setup | Yes | For pre-training, we only used Lreg to train the SE unit. At this stage, we selected Libri-360 and Libri-500 as the clean speech corpus. The SE unit was trained for 500,000 steps using the Adam optimizer with β1 = 0.9 and β2 = 0.999, learning rate 0.0002, gradient clipping value 1, and batch size 8. For fine-tuning, we used both Lreg and Lϕ cls to re-train the SE model initialized by the checkpoint selected from the pre-training stage. ... The SE unit was trained for 100,000 steps with the Adam optimizer with β1 = 0.9 and β2 = 0.999, learning rate 0.0001, gradient clipping value 1, and batch size 16. |