reproducibilityindex.ai

PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Dependent Adaptive Prior

Authors: Sang-gil Lee, Heeseung Kim, Chaehun Shin, Xu Tan, Chang Liu, Qi Meng, Tao Qin, Wei Chen, Sungroh Yoon, Tie-Yan Liu

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We implemented Prior Grad based on the recently proposed diffusion-based speech generative models (Kong et al., 2021; Chen et al., 2021; Jeong et al., 2021), and conducted experiments on the LJSpeech (Ito & Johnson, 2017) dataset. The experimental results demonstrate the beneﬁts of Prior Grad, such as a signiﬁcantly faster model convergence during training, improved perceptual quality, and an improved tolerance to a reduction in network capacity.
Researcher Affiliation	Collaboration	Sang-gil Lee1 Heeseung Kim1 Chaehun Shin1 Xu Tan2 Chang Liu2 Qi Meng2 Tao Qin2 Wei Chen2 Sungroh Yoon1,3 Tie-Yan Liu2 1Data Science & AI Lab., Seoul National University 2Microsoft Research Asia 3 AIIS, ASRI, INMC, ISRC, NSI, and Interdisciplinary Program in Artiﬁcial Intelligence, Seoul National University
Pseudocode	Yes	Algorithms 1 and 2 describe the training and sampling procedures augmented by the datadependent prior (µ, Σ).
Open Source Code	No	We followed the publicly available implementation3, where it uses a 2.62M parameter model... 3https://github.com/lmnt-com/diffwave - This link refers to a baseline implementation (Diff Wave), not explicitly the open-source code for the authors' proposed method (Prior Grad).
Open Datasets	Yes	We used LJSpeech (Ito & Johnson, 2017) dataset for all experiments, which is a commonly used open-source 24h speech dataset with 13,100 audio clips from a single female speaker.
Dataset Splits	Yes	We used 13,000 clips as the training set, 5 clips as the validation set, and the remaining 95 clips as the test set used for an objective and subjective audio quality evaluation.
Hardware Specification	Yes	Training for 1M iterations took approximately 7 days with a single NVIDIA A40 GPU. ... Training for 300K iterations took approximately 2 days on a single NVIDIA P100 GPU.
Software Dependencies	No	The paper mentions specific tools and libraries like Adam optimizer, Parallel Wave GAN, Hi Fi-GAN, MFA toolkit, SWIPE, and links to some open-source libraries, but does not provide specific version numbers for these software dependencies (e.g., PyTorch version, exact Adam version, or specific library versions like auraloss version X.Y).
Experiment Setup	Yes	We used the publicly available implementation3, where it uses a 2.62M parameter model with an Adam optimizer (Kingma & Ba, 2014) and a learning rate of 2 10 4 for a total of 1M iterations. ... We used the default diffusion steps with T = 50 and the linear beta schedule ranging from 1 10 4 to 5 10 2 for training and inference... We also used the fast Tinfer = 6 inference noise schedule... We conducted a comparative study of Prior Grad acoustic model with a different diffusion decoder network capacity, i.e., a small model with 3.5M parameters (128 residual channels), and a large model with 10M parameters (256 residual channels).