NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis
Authors: Hyeong-Seok Choi, Jinhyeok Yang, Juheon Lee, Hyeongju Kim
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that the proposed framework offers competitive advantages such as controllability, data efficiency, and fast training convergence, while providing high quality synthesis. |
| Researcher Affiliation | Collaboration | Hyeong-Seok Choi1,2, *Jinhyeok Yang2, *Juheon Lee1,2, *Hyeongju Kim2 1Seoul National University 2Supertone, Inc., {kekepa15,yangyangii,juheon2,hyeongju}@supertone.ai |
| Pseudocode | No | The paper includes architectural diagrams (e.g., Figure 6, 7, 8) but no explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about the release of source code for the methodology or a link to a code repository. |
| Open Datasets | Yes | We sampled 30 speech and noise recordings from the VCTK and DEMAND dataset (Veaux et al., 2017; Thiemann et al., 2013), respectively, and mixed them with 5 d B signal-to-noise ratio (SNR). |
| Dataset Splits | Yes | We randomly selected 4800 and 600 speakers, and constructed training and validation sets respectively by merging their utterances and speech data of the NANSY++ backbone dataset. |
| Hardware Specification | Yes | The batch size was set to 60 using 10 RTX 3090 GPUs. |
| Software Dependencies | No | The paper mentions optimizers and tools like 'Adam optimizer (Kingma & Ba, 2014)' and 'Silero, 2021' but does not specify version numbers for general software dependencies or libraries. |
| Experiment Setup | Yes | We trained the backbone model for 1M iterations with Adam optimizer (Kingma & Ba, 2014) with the learning rate of 10 4. The learning rate for MPD was 2 10 4. The batch size was set to 60 using 10 RTX 3090 GPUs. |