MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis
Authors: Kundan Kumar, Rithesh Kumar, Thibault de Boissiere, Lucas Gestin, Wei Zhen Teoh, Jose Sotelo, Alexandre de Brébisson, Yoshua Bengio, Aaron C. Courville
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Subjective evaluation metric (Mean Opinion Score, or MOS) shows the effectiveness of the proposed approach for high quality mel-spectrogram inversion. To establish the generality of the proposed techniques, we show qualitative results of our model in speech synthesis, music domain translation and unconditional music synthesis. We evaluate the various components of the model through ablation studies and suggest a set of guidelines to design general purpose discriminators and generators for conditional sequence synthesis tasks. Table 2: Mean Opinion Score of ablation studies. |
| Researcher Affiliation | Collaboration | Kundan Kumar Lyrebird AI, Mila, University of Montreal kundan@descript.com Rithesh Kumar Lyrebird AI rithesh@descript.com Thibault de Boissiere Lyrebird AI Lucas Gestin Lyrebird AI Wei Zhen Teoh Lyrebird AI Jose Sotelo Lyrebird AI, Mila Alexandre de Brebisson Lyrebird AI, Mila Yoshua Bengio Mila University of Montreal, CIFAR Program Co-director Aaron Courville Mila University of Montreal, CIFAR Fellow |
| Pseudocode | No | The paper describes the model architecture and training objective in prose and diagrams but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | To encourage reproduciblity, we attach the code4 accompanying the paper. 4https://github.com/descriptinc/melgan-neurips. |
| Open Datasets | Yes | Each model is trained for 400k iterations on the LJ Speech dataset (Ito, 2017). we run an MOS hearing test for ground-truth mel-spectrogram inversion on the public available VCTK dataset (Veaux et al., 2017). |
| Dataset Splits | No | The paper mentions using LJ Speech and VCTK datasets and evaluates on a 'test set', but does not provide specific training/validation/test split percentages, sample counts, or explicit splitting methodology for reproduction. |
| Hardware Specification | Yes | We use NVIDIA GTX 1080Ti for the GPU benchmark and Intel(R) Core(TM) i9-7920X CPU @ 2.90GHz processor for the CPU benchmark, tested on only 1 CPU core. For all experiments, Mel GAN was trained with batch size 16 on a single NVIDIA RTX2080Ti GPU. |
| Software Dependencies | No | The paper mentions 'pytorch implementation' and uses 'Adam as the optimizer' but does not specify version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | For all experiments, Mel GAN was trained with batch size 16 on a single NVIDIA RTX2080Ti GPU. We use Adam as the optimizer with learning rate of 1e-4 with β1 = 0.5 and β2 = 0.9 for the generator and the discriminators. |