Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Dependent Adaptive Prior
Authors: Sang-gil Lee, Heeseung Kim, Chaehun Shin, Xu Tan, Chang Liu, Qi Meng, Tao Qin, Wei Chen, Sungroh Yoon, Tie-Yan Liu
ICLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implemented Prior Grad based on the recently proposed diffusion-based speech generative models (Kong et al., 2021; Chen et al., 2021; Jeong et al., 2021), and conducted experiments on the LJSpeech (Ito & Johnson, 2017) dataset. The experimental results demonstrate the benefits of Prior Grad, such as a significantly faster model convergence during training, improved perceptual quality, and an improved tolerance to a reduction in network capacity. |
| Researcher Affiliation | Collaboration | Sang-gil Lee1 Heeseung Kim1 Chaehun Shin1 Xu Tan2 Chang Liu2 Qi Meng2 Tao Qin2 Wei Chen2 Sungroh Yoon1,3 Tie-Yan Liu2 1Data Science & AI Lab., Seoul National University 2Microsoft Research Asia 3 AIIS, ASRI, INMC, ISRC, NSI, and Interdisciplinary Program in Artificial Intelligence, Seoul National University |
| Pseudocode | Yes | Algorithms 1 and 2 describe the training and sampling procedures augmented by the datadependent prior (µ, Σ). |
| Open Source Code | No | We followed the publicly available implementation3, where it uses a 2.62M parameter model... 3https://github.com/lmnt-com/diffwave - This link refers to a baseline implementation (Diff Wave), not explicitly the open-source code for the authors' proposed method (Prior Grad). |
| Open Datasets | Yes | We used LJSpeech (Ito & Johnson, 2017) dataset for all experiments, which is a commonly used open-source 24h speech dataset with 13,100 audio clips from a single female speaker. |
| Dataset Splits | Yes | We used 13,000 clips as the training set, 5 clips as the validation set, and the remaining 95 clips as the test set used for an objective and subjective audio quality evaluation. |
| Hardware Specification | Yes | Training for 1M iterations took approximately 7 days with a single NVIDIA A40 GPU. ... Training for 300K iterations took approximately 2 days on a single NVIDIA P100 GPU. |
| Software Dependencies | No | The paper mentions specific tools and libraries like Adam optimizer, Parallel Wave GAN, Hi Fi-GAN, MFA toolkit, SWIPE, and links to some open-source libraries, but does not provide specific version numbers for these software dependencies (e.g., PyTorch version, exact Adam version, or specific library versions like auraloss version X.Y). |
| Experiment Setup | Yes | We used the publicly available implementation3, where it uses a 2.62M parameter model with an Adam optimizer (Kingma & Ba, 2014) and a learning rate of 2 10 4 for a total of 1M iterations. ... We used the default diffusion steps with T = 50 and the linear beta schedule ranging from 1 10 4 to 5 10 2 for training and inference... We also used the fast Tinfer = 6 inference noise schedule... We conducted a comparative study of Prior Grad acoustic model with a different diffusion decoder network capacity, i.e., a small model with 3.5M parameters (128 residual channels), and a large model with 10M parameters (256 residual channels). |