Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
DOSE: Diffusion Dropout with Adaptive Prior for Speech Enhancement
Authors: Wenxin Tai, Yue Lei, Fan Zhou, Goce Trajcevski, Ting Zhong
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate that our approach yields substantial improvements in high-quality and stable speech generation, consistency with the condition factor, and inference efficiency. |
| Researcher Affiliation | Collaboration | Wenxin Tai1, Yue Lei1, Fan Zhou1,2 , Goce Trajcevski3, Ting Zhong1,2 University of Electronic Science and Technology of China Kashi Institute of Electronics and Information Industry Iowa State University |
| Pseudocode | Yes | Algorithm 1 DOSE Training; Algorithm 2 DOSE Sampling |
| Open Source Code | Yes | Codes are publicly available at https://github.com/ICDM-UESTC/DOSE. |
| Open Datasets | Yes | Following previous works [4, 9, 8], we use the Voice Bank-DEMAND dataset [22, 23] for performance evaluations. To investigate the generalization ability of models, we use CHi ME-4 [24] as another test dataset following [9], i.e., the models are trained on Voice Bank DEMAND and evaluated on CHi ME-4. |
| Dataset Splits | Yes | We select the best values for Ï1 and Ï2 according to the performance on a validation dataset, a small subset (10%) extracted from the training data. |
| Hardware Specification | Yes | We train all methods for 300,000 iterations using 1 NVIDIA RTX 3090 GPU with a batch size of 16 audios. |
| Software Dependencies | No | The paper mentions using 'Diff Wave [7] as the basic architecture' but does not specify software dependencies like programming languages or libraries with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We train all methods for 300,000 iterations using 1 NVIDIA RTX 3090 GPU with a batch size of 16 audios. Diff Wave takes 50 steps with the linearly spaced training noise schedule Îēt 1 10 4, 0.035 [4]. We select the best values for Ï1 and Ï2 according to the performance on a validation dataset, a small subset (10%) extracted from the training data. More experiment settings can be found in Appendix A.10. Specifically, the network is composed of 30 residual layers with residual channels 128. We use a bidirectional dilated convolution (Bi-Dil Conv) with kernel size 3 in each layer. We sum the skip connections from all residual layers. The total number of trainable parameters is 2.31M, slightly smaller than naive Diff Wave (2.64M). |