Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
D4AM: A General Denoising Framework for Downstream Acoustic Models
Authors: Chi-Chang Lee, Yu Tsao, Hsin-Min Wang, Chu-Song Chen
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental results show that D4AM can consistently and effectively provide improvements to various unseen acoustic models and outperforms other combination setups. |
| Researcher Affiliation | Academia | 1National Taiwan University, Taipei, Taiwan 2Academia Sinica, Taipei, Taiwan |
| Pseudocode | Yes | Algorithm 1 D4AM (A General Denoising Framework for Downstream Acoustic Models) |
| Open Source Code | Yes | Our code is available at https://github.com/Chang Lee0903/D4AM. |
| Open Datasets | Yes | The training datasets used in this study include: noise signals from DNS-Challenge (Reddy et al., 2020) and speech utterances from Libri Speech (Panayotov et al., 2015). |
| Dataset Splits | Yes | For CHi ME-4, we evaluated the performance on the development and test sets of the 1-st track. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, memory, or cloud instance types used for running the experiments. |
| Software Dependencies | No | The paper mentions software like the 'Speech Brain toolkit' and 'DEMUCS' and the 'Adam optimizer', but it does not specify any version numbers for these software components or libraries. |
| Experiment Setup | Yes | For pre-training, we only used Lreg to train the SE unit. At this stage, we selected Libri-360 and Libri-500 as the clean speech corpus. The SE unit was trained for 500,000 steps using the Adam optimizer with β1 = 0.9 and β2 = 0.999, learning rate 0.0002, gradient clipping value 1, and batch size 8. For fine-tuning, we used both Lreg and Lϕ cls to re-train the SE model initialized by the checkpoint selected from the pre-training stage. ... The SE unit was trained for 100,000 steps with the Adam optimizer with β1 = 0.9 and β2 = 0.999, learning rate 0.0001, gradient clipping value 1, and batch size 16. |