FPETS: Fully Parallel End-to-End Text-to-Speech System

Authors: Dabiao Ma, Zhiba Su, Wenxuan Wang, Yuhao Lu8457-8463

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show FPETS utilizes the power of parallel computation and reaches a significant speed up of inference compared with state-of-the-art end-to-end TTS systems.
Researcher Affiliation Collaboration 1Turing Robot Co.,Ltd. Beijing, China {madabiao, suzhiba, luyuhao}@uzoo.cn 2The Chinese University of Hong Kong, Shenzhen. Guangdong, China wenxuanwang1@link.cuhk.edu.cn
Pseudocode No The paper describes the model architecture and training strategy in text and diagrams, but does not include structured pseudocode or algorithm blocks.
Open Source Code No Codes and demos will be released at https://github.com/ suzhiba/Full-parallel 100x real time End2End TTS
Open Datasets Yes LJ speech(Ito 2017) is a public speech dataset consisting of 13100 pairs of text and 22050 HZ audio clips.
Dataset Splits No The paper mentions using LJ speech dataset and various evaluation sets (Harvard Sentences, 100 random sentences), but it does not provide specific training/validation/test splits for the main dataset to reproduce the data partitioning.
Hardware Specification Yes All the experiments are done on 4 GTX 1080Ti GPUs
Software Dependencies No The paper mentions using Adam optimizer with specific parameters, but does not provide specific software dependencies like programming languages, libraries, or frameworks with version numbers.
Experiment Setup Yes Hyperparameters of our model are showed in Table 1. ... Each model is trained 300k steps. All the experiments are done on 4 GTX 1080Ti GPUs, with batch size of 32 sentences on each GPU.