reproducibilityindex.ai

BigVGAN: A Universal Neural Vocoder with Large-Scale Training

Authors: Sang-gil Lee, Wei Ping, Boris Ginsburg, Bryan Catanzaro, Sungroh Yoon

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct a comprehensive evaluation of Big VGAN for both in-distribution and out-of-distribution scenarios. We train Big VGAN and all baseline models on the full Libri TTS dataset. ... We report the performance of Big VGAN and the baseline models evaluated on Libri TTS using above objective and subjective metrics. ... Table 2 shows the in-distribution test results on Libri TTS. ... Table 3 summarizes the SMOS results from three different types of unseen dataset. ... Table 4 shows the SMOS results from the 5 tracks and their average from the MUSDB18-HQ test set. ... Table 5: Ablation results on training data diversity using 112M Big VGAN model, evaluated on Libri TTS.
Researcher Affiliation	Collaboration	Sang-gil Lee1 Wei Ping 2 Boris Ginsburg2 Bryan Catanzaro2 Sungroh Yoon1,3 1 Data Science & AI Lab, Seoul National University (SNU) 2 NVIDIA 3 AIIS, ASRI, INMC, ISRC, NSI, and Interdisciplinary Program in AI, SNU
Pseudocode	No	The paper provides detailed architectural diagrams and textual descriptions of the model components but does not include formal pseudocode or algorithm blocks.
Open Source Code	Yes	We release our code and model at: https://github.com/NVIDIA/BigVGAN.
Open Datasets	Yes	We use Libri TTS (Zen et al., 2019) dataset with the original sampling rate of 24 k Hz for training.
Dataset Splits	Yes	We perform objective evaluations on dev-clean and dev-other altogether, and conduct subjective evaluations on the combined test-clean and test-other.
Hardware Specification	Yes	Table 1: Model footprint and synthesis speed for 24 k Hz audio measured on an NVIDIA RTX 8000 GPU.
Software Dependencies	No	No specific version numbers are provided for the software dependencies mentioned, such as 'librosa', 'Auraloss', 'python-pesq', 'python-MCD', 'CARGAN' code, or 'NVIDIA Ne Mo' toolkit.
Experiment Setup	Yes	We train all Big VGAN models including the ablation models and the baseline Hi Fi-GAN using our training configuration for 1M steps. We use the batch size of 32, a segment size of 8,192, and a initial learning rate of 1 10 4. ... Refer to Table 6 in the Appendix A for detailed hyperparameters.