Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech
Authors: Yoonhyung Lee, Joongbo Shin, Kyomin Jung
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments conducted on LJSpeech dataset, we show that our model generates a mel-spectrogram 27 times faster than Tacotron 2 with similar speech quality. |
| Researcher Affiliation | Academia | Yoonhyung Lee, Joongbo Shin, Kyomin Jung Department of Electrical and Computer Engineering Seoul National University Seoul, South Korea {cpi1234, jbshin, kjung}@snu.ac.kr |
| Pseudocode | Yes | In Appendix A.1, pseudo-codes for the training and inference of BVAE-TTS are contained for detailed descriptions. |
| Open Source Code | Yes | 1https://github.com/LEEYOONHYUNG/BVAE-TTS |
| Open Datasets | Yes | In the experiments, we mainly use the LJSpeech dataset (Ito & Johnson, 2017) consisting of 12500 / 100 / 500 samples for training / validation / test splits, respectively. |
| Dataset Splits | Yes | In the experiments, we mainly use the LJSpeech dataset (Ito & Johnson, 2017) consisting of 12500 / 100 / 500 samples for training / validation / test splits, respectively. |
| Hardware Specification | Yes | Training of BVAE-TTS takes about 48 hours on Intel(R) Xeon(R) Gold 5120 CPU (2.2GHz) and NVIDIA V100 GPU on the Pytorch 1.16.0 library with Python 3.6.10 over the Ubuntu 16.04 LTS. |
| Software Dependencies | Yes | Training of BVAE-TTS takes about 48 hours on Intel(R) Xeon(R) Gold 5120 CPU (2.2GHz) and NVIDIA V100 GPU on the Pytorch 1.16.0 library with Python 3.6.10 over the Ubuntu 16.04 LTS. |
| Experiment Setup | Yes | We train the BVAE-TTS consisting of 4 BVAE blocks for 300K iterations with a batch size of 128. For an optimizer, we use the Adamax Optimizer (Kingma & Ba, 2015) with β1 = 0.9, β2 = 0.999 using the learning rate scheduling used in (Vaswani et al., 2017), where initial learning rate of 1e3 and warm-up step of 4000 are used. |