FPETS: Fully Parallel End-to-End Text-to-Speech System
Authors: Dabiao Ma, Zhiba Su, Wenxuan Wang, Yuhao Lu8457-8463
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show FPETS utilizes the power of parallel computation and reaches a significant speed up of inference compared with state-of-the-art end-to-end TTS systems. |
| Researcher Affiliation | Collaboration | 1Turing Robot Co.,Ltd. Beijing, China {madabiao, suzhiba, luyuhao}@uzoo.cn 2The Chinese University of Hong Kong, Shenzhen. Guangdong, China wenxuanwang1@link.cuhk.edu.cn |
| Pseudocode | No | The paper describes the model architecture and training strategy in text and diagrams, but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | No | Codes and demos will be released at https://github.com/ suzhiba/Full-parallel 100x real time End2End TTS |
| Open Datasets | Yes | LJ speech(Ito 2017) is a public speech dataset consisting of 13100 pairs of text and 22050 HZ audio clips. |
| Dataset Splits | No | The paper mentions using LJ speech dataset and various evaluation sets (Harvard Sentences, 100 random sentences), but it does not provide specific training/validation/test splits for the main dataset to reproduce the data partitioning. |
| Hardware Specification | Yes | All the experiments are done on 4 GTX 1080Ti GPUs |
| Software Dependencies | No | The paper mentions using Adam optimizer with specific parameters, but does not provide specific software dependencies like programming languages, libraries, or frameworks with version numbers. |
| Experiment Setup | Yes | Hyperparameters of our model are showed in Table 1. ... Each model is trained 300k steps. All the experiments are done on 4 GTX 1080Ti GPUs, with batch size of 32 sentences on each GPU. |