Avocodo: Generative Adversarial Network for Artifact-Free Vocoder

Authors: Taejun Bak, Junmo Lee, Hanbin Bae, Jinhyeok Yang, Jae-Sung Bae, Young-Sun Joo

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental According to experimental results, Avocodo outperforms baseline GAN-based vocoders, both objectively and subjectively, while reproducing speech with fewer artifacts. Experimental Results Audio Quality & Comparison The performance of the proposed model for each dataset was assessed using various subjective and objective measurements.
Researcher Affiliation Industry Taejun Bak1*, Junmo Lee2* , Hanbin Bae3 , Jinhyeok Yang4 , Jae-Sung Bae3 , Young-Sun Joo1 1AI Center, NCSOFT, Seongnam, Korea 2SK Telecom, Seoul, Korea 3Samsung Research, Seoul, Korea 4Supertone Inc., Seoul, Korea happyjun@ncsoft.com, ljun4121@sk.com
Pseudocode No The paper does not contain any sections explicitly labeled 'Pseudocode' or 'Algorithm', nor are there structured steps formatted like code.
Open Source Code Yes Source code is available at https://github.com/ncsoft/avocodo.
Open Datasets Yes The LJSpeech (Ito and Johnson 2017) dataset was used for a single speaker experiment. ... Public English dataset, i.e., VCTK (Yamagishi, Veaux, and Mac Donald 2019), (Unseen(EN)) and internal Korean dataset (Unseen(KR)) were used to evaluate the generalization of the proposed model.
Dataset Splits No The paper does not explicitly state the training/validation/test dataset splits with specific percentages, counts for all splits, or references to predefined full splits. It mentions 'For the testset, 150 samples are randomly selected.' and '9 speakers were selected for the testset.' and '16 unseen speakers were excluded from the training,' but does not specify a separate validation split or complete, reproducible split information.
Hardware Specification Yes Inference times for these two models are almost the same in CPU (Intel i7 CPU 3.00GHz) and single-GPU (NVIDIA V100) environments.
Software Dependencies No The paper mentions software components and libraries like Adam W optimizer, but does not provide specific version numbers for these or other key software dependencies required for replication.
Experiment Setup Yes An Adam W optimizer (Loshchilov and Hutter 2019) was used with an initial learning rate of 2 10 4. The optimizer parameters (β1, β2) were set as (0.8, 0.99), and an exponential learning rate decay of 0.999 was applied... 80 bands of mel-spectrograms were calculated from audio samples using the short-time Fourier transform (STFT). The STFT parameters for 22,050Hz were set as 1,024, 1,024, 256 for the number of STFT bin, window sizes, and hop sizes, respectively. For 24k Hz, the parameters were set as 2,048, 1,200, 300, respectively. Each audio sample was sliced with the random window selection method. The segment size was 8,192, which is about 0.4s long. ... λfm and λspec are set as 2 and 45, respectively.