reproducibilityindex.ai

DiffWave: A Versatile Diffusion Model for Audio Synthesis

Authors: Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, Bryan Catanzaro

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate Diff Wave on neural vocoding, unconditional and class-conditional generation tasks.
Researcher Affiliation	Collaboration	Zhifeng Kong Computer Science and Engineering, UCSD z4kong@eng.ucsd.edu Wei Ping NVIDIA wping@nvidia.com Jiaji Huang, Kexin Zhao Baidu Research {huangjiaji,kexinzhao}@baidu.com Bryan Catanzaro NVIDIA bcatanzaro@nvidia.com
Pseudocode	Yes	Algorithm 1 Training and Algorithm 2 Sampling
Open Source Code	No	The paper states 'Audio samples are in: https://diffwave-demo.github.io/' which is a demo page and not a link to the open-source code for the methodology. There is no explicit statement about code release.
Open Datasets	Yes	We use the LJ speech dataset (Ito, 2017) that contains 24 hours of audio recorded in home environment with a sampling rate of 22.05 k Hz. and We use the Speech Commands dataset (Warden, 2018)
Dataset Splits	No	The paper mentions using the LJ speech dataset and Speech Commands dataset for training and evaluation. For LJ speech, it states 'We train Diff Wave on 8 Nvidia 2080Ti GPUs using random short audio clips of 16,000 samples from each utterance.' and 'where the test utterances from all models were presented to Mechanical Turk workers' for MOS evaluation. For Speech Commands, it mentions 'The SC09 dataset contains 31,158 training utterances' and later refers to 'trainset' and 'testset' in the context of training a separate classifier, but does not provide explicit train/validation/test splits (percentages, counts, or splitting methodology) for the Diff Wave models themselves.
Hardware Specification	Yes	We train Diff Wave on 8 Nvidia 2080Ti GPUs using random short audio clips of 16,000 samples from each utterance.
Software Dependencies	No	The paper mentions using the Adam optimizer but does not specify version numbers for any software dependencies or libraries used in their implementation.
Experiment Setup	Yes	We use Adam optimizer (Kingma & Ba, 2015) with a batch size of 16 and learning rate 2 10 4. We train all Diff Wave models for 1M steps.