BigVGAN: A Universal Neural Vocoder with Large-Scale Training
Authors: Sang-gil Lee, Wei Ping, Boris Ginsburg, Bryan Catanzaro, Sungroh Yoon
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct a comprehensive evaluation of Big VGAN for both in-distribution and out-of-distribution scenarios. We train Big VGAN and all baseline models on the full Libri TTS dataset. ... We report the performance of Big VGAN and the baseline models evaluated on Libri TTS using above objective and subjective metrics. ... Table 2 shows the in-distribution test results on Libri TTS. ... Table 3 summarizes the SMOS results from three different types of unseen dataset. ... Table 4 shows the SMOS results from the 5 tracks and their average from the MUSDB18-HQ test set. ... Table 5: Ablation results on training data diversity using 112M Big VGAN model, evaluated on Libri TTS. |
| Researcher Affiliation | Collaboration | Sang-gil Lee1 Wei Ping 2 Boris Ginsburg2 Bryan Catanzaro2 Sungroh Yoon1,3 1 Data Science & AI Lab, Seoul National University (SNU) 2 NVIDIA 3 AIIS, ASRI, INMC, ISRC, NSI, and Interdisciplinary Program in AI, SNU |
| Pseudocode | No | The paper provides detailed architectural diagrams and textual descriptions of the model components but does not include formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | We release our code and model at: https://github.com/NVIDIA/BigVGAN. |
| Open Datasets | Yes | We use Libri TTS (Zen et al., 2019) dataset with the original sampling rate of 24 k Hz for training. |
| Dataset Splits | Yes | We perform objective evaluations on dev-clean and dev-other altogether, and conduct subjective evaluations on the combined test-clean and test-other. |
| Hardware Specification | Yes | Table 1: Model footprint and synthesis speed for 24 k Hz audio measured on an NVIDIA RTX 8000 GPU. |
| Software Dependencies | No | No specific version numbers are provided for the software dependencies mentioned, such as 'librosa', 'Auraloss', 'python-pesq', 'python-MCD', 'CARGAN' code, or 'NVIDIA Ne Mo' toolkit. |
| Experiment Setup | Yes | We train all Big VGAN models including the ablation models and the baseline Hi Fi-GAN using our training configuration for 1M steps. We use the batch size of 32, a segment size of 8,192, and a initial learning rate of 1 10 4. ... Refer to Table 6 in the Appendix A for detailed hyperparameters. |