reproducibilityindex.ai

Chunked Autoregressive GAN for Conditional Waveform Synthesis

Authors: Max Morrison, Rithesh Kumar, Kundan Kumar, Prem Seetharaman, Aaron Courville, Yoshua Bengio

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We train Hi Fi-GAN (V1) on the VCTK dataset (Yamagishi et al., 2019) and evaluated the pitch accuracy on 256 randomly selected sentences from a validation set containing speakers seen during training. We perform spectrogram-to-waveform inversion on speech. All models are trained with a batch size of 64. We use the Adam W optimizer (Loshchilov & Hutter, 2017) with a learning rate of 2 × 10−4 and β = (.8, .99).
Researcher Affiliation	Collaboration	Max Morrison Northwestern University morrimax@u.northwestern.edu Rithesh Kumar, Kundan Kumar1 & Prem Seetharaman Descript, Inc. {rithesh, kundan, prem}@descript.com Aaron Courville1, 2 & Yoshua Bengio1, 3 1 Mila, Qu ebec Artiﬁcial Intelligence Institute, Universit e de Montr eal
Pseudocode	No	The paper describes the model architecture and experimental procedures in narrative text and figures, but does not include any pseudocode or algorithm blocks.
Open Source Code	Yes	We also make our code available under an opensource license1 Code is available at https://github.com/descriptinc/cargan. In order to facilitate reproduction of our research, we provide documented, opensource code that permits reproducing and evaluating all experiments in our paper...
Open Datasets	Yes	We trained Hi Fi-GAN (V1) on the VCTK dataset (Yamagishi et al., 2019) and evaluate on both VCTK and DAPS (Mysore, 2014). For evaluation on DAPS, we use the segmented dataset of the ﬁrst script of the clean partition available on Zenodo (Morrison et al., 2021).
Dataset Splits	Yes	For training on VCTK, we randomly select 100 speakers. We train on a random 95% of the data from these 100 speakers, using data from both microphones. evaluated the pitch accuracy on 256 randomly selected sentences from a validation set containing speakers seen during training.
Hardware Specification	Yes	We use a single RTX A6000 for training and generation on a GPU, and two cores of an AMD EPYC 7742 with one thread per core and a 2.25 GHz maximum clock speed for CPU benchmarking.
Software Dependencies	No	The paper mentions 'torchcrepe' and 'PyTorch Hub' but does not provide specific version numbers for all key software dependencies needed for reproducibility, such as PyTorch or Python.
Experiment Setup	Yes	All models are trained with a batch size of 64. We use the Adam W optimizer (Loshchilov & Hutter, 2017) with a learning rate of 2 × 10−4 and β = (.8, .99). We use an exponential learning rate schedule that multiplies the learning rate by .999 after each epoch. All models are trained for 500,000 steps.