DiffWave: A Versatile Diffusion Model for Audio Synthesis
Authors: Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, Bryan Catanzaro
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate Diff Wave on neural vocoding, unconditional and class-conditional generation tasks. |
| Researcher Affiliation | Collaboration | Zhifeng Kong Computer Science and Engineering, UCSD z4kong@eng.ucsd.edu Wei Ping NVIDIA wping@nvidia.com Jiaji Huang, Kexin Zhao Baidu Research {huangjiaji,kexinzhao}@baidu.com Bryan Catanzaro NVIDIA bcatanzaro@nvidia.com |
| Pseudocode | Yes | Algorithm 1 Training and Algorithm 2 Sampling |
| Open Source Code | No | The paper states 'Audio samples are in: https://diffwave-demo.github.io/' which is a demo page and not a link to the open-source code for the methodology. There is no explicit statement about code release. |
| Open Datasets | Yes | We use the LJ speech dataset (Ito, 2017) that contains 24 hours of audio recorded in home environment with a sampling rate of 22.05 k Hz. and We use the Speech Commands dataset (Warden, 2018) |
| Dataset Splits | No | The paper mentions using the LJ speech dataset and Speech Commands dataset for training and evaluation. For LJ speech, it states 'We train Diff Wave on 8 Nvidia 2080Ti GPUs using random short audio clips of 16,000 samples from each utterance.' and 'where the test utterances from all models were presented to Mechanical Turk workers' for MOS evaluation. For Speech Commands, it mentions 'The SC09 dataset contains 31,158 training utterances' and later refers to 'trainset' and 'testset' in the context of training a separate classifier, but does not provide explicit train/validation/test splits (percentages, counts, or splitting methodology) for the Diff Wave models themselves. |
| Hardware Specification | Yes | We train Diff Wave on 8 Nvidia 2080Ti GPUs using random short audio clips of 16,000 samples from each utterance. |
| Software Dependencies | No | The paper mentions using the Adam optimizer but does not specify version numbers for any software dependencies or libraries used in their implementation. |
| Experiment Setup | Yes | We use Adam optimizer (Kingma & Ba, 2015) with a batch size of 16 and learning rate 2 10 4. We train all Diff Wave models for 1M steps. |