reproducibilityindex.ai

FPETS: Fully Parallel End-to-End Text-to-Speech System

Authors: Dabiao Ma, Zhiba Su, Wenxuan Wang, Yuhao Lu8457-8463

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show FPETS utilizes the power of parallel computation and reaches a signiﬁcant speed up of inference compared with state-of-the-art end-to-end TTS systems.
Researcher Affiliation	Collaboration	1Turing Robot Co.,Ltd. Beijing, China {madabiao, suzhiba, luyuhao}@uzoo.cn 2The Chinese University of Hong Kong, Shenzhen. Guangdong, China wenxuanwang1@link.cuhk.edu.cn
Pseudocode	No	The paper describes the model architecture and training strategy in text and diagrams, but does not include structured pseudocode or algorithm blocks.
Open Source Code	No	Codes and demos will be released at https://github.com/ suzhiba/Full-parallel 100x real time End2End TTS
Open Datasets	Yes	LJ speech(Ito 2017) is a public speech dataset consisting of 13100 pairs of text and 22050 HZ audio clips.
Dataset Splits	No	The paper mentions using LJ speech dataset and various evaluation sets (Harvard Sentences, 100 random sentences), but it does not provide specific training/validation/test splits for the main dataset to reproduce the data partitioning.
Hardware Specification	Yes	All the experiments are done on 4 GTX 1080Ti GPUs
Software Dependencies	No	The paper mentions using Adam optimizer with specific parameters, but does not provide specific software dependencies like programming languages, libraries, or frameworks with version numbers.
Experiment Setup	Yes	Hyperparameters of our model are showed in Table 1. ... Each model is trained 300k steps. All the experiments are done on 4 GTX 1080Ti GPUs, with batch size of 32 sentences on each GPU.