Avocodo: Generative Adversarial Network for Artifact-Free Vocoder
Authors: Taejun Bak, Junmo Lee, Hanbin Bae, Jinhyeok Yang, Jae-Sung Bae, Young-Sun Joo
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | According to experimental results, Avocodo outperforms baseline GAN-based vocoders, both objectively and subjectively, while reproducing speech with fewer artifacts. Experimental Results Audio Quality & Comparison The performance of the proposed model for each dataset was assessed using various subjective and objective measurements. |
| Researcher Affiliation | Industry | Taejun Bak1*, Junmo Lee2* , Hanbin Bae3 , Jinhyeok Yang4 , Jae-Sung Bae3 , Young-Sun Joo1 1AI Center, NCSOFT, Seongnam, Korea 2SK Telecom, Seoul, Korea 3Samsung Research, Seoul, Korea 4Supertone Inc., Seoul, Korea happyjun@ncsoft.com, ljun4121@sk.com |
| Pseudocode | No | The paper does not contain any sections explicitly labeled 'Pseudocode' or 'Algorithm', nor are there structured steps formatted like code. |
| Open Source Code | Yes | Source code is available at https://github.com/ncsoft/avocodo. |
| Open Datasets | Yes | The LJSpeech (Ito and Johnson 2017) dataset was used for a single speaker experiment. ... Public English dataset, i.e., VCTK (Yamagishi, Veaux, and Mac Donald 2019), (Unseen(EN)) and internal Korean dataset (Unseen(KR)) were used to evaluate the generalization of the proposed model. |
| Dataset Splits | No | The paper does not explicitly state the training/validation/test dataset splits with specific percentages, counts for all splits, or references to predefined full splits. It mentions 'For the testset, 150 samples are randomly selected.' and '9 speakers were selected for the testset.' and '16 unseen speakers were excluded from the training,' but does not specify a separate validation split or complete, reproducible split information. |
| Hardware Specification | Yes | Inference times for these two models are almost the same in CPU (Intel i7 CPU 3.00GHz) and single-GPU (NVIDIA V100) environments. |
| Software Dependencies | No | The paper mentions software components and libraries like Adam W optimizer, but does not provide specific version numbers for these or other key software dependencies required for replication. |
| Experiment Setup | Yes | An Adam W optimizer (Loshchilov and Hutter 2019) was used with an initial learning rate of 2 10 4. The optimizer parameters (β1, β2) were set as (0.8, 0.99), and an exponential learning rate decay of 0.999 was applied... 80 bands of mel-spectrograms were calculated from audio samples using the short-time Fourier transform (STFT). The STFT parameters for 22,050Hz were set as 1,024, 1,024, 256 for the number of STFT bin, window sizes, and hop sizes, respectively. For 24k Hz, the parameters were set as 2,048, 1,200, 300, respectively. Each audio sample was sliced with the random window selection method. The segment size was 8,192, which is about 0.4s long. ... λfm and λspec are set as 2 and 45, respectively. |