reproducibilityindex.ai

EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture

Authors: Chenfeng Miao, Liang Shuang, Zhengchen Liu, Chen Minchuan, Jun Ma, Shaojun Wang, Jing Xiao

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimentally show that the proposed models signiﬁcantly outperform counterpart models such as Tacotron 2 (Shen et al.) and Glow-TTS (Kim et al., 2020) in terms of speech quality, training efﬁciency and synthesis speed, while still producing the speeches of strong robustness and great diversity.
Researcher Affiliation	Industry	1Ping An Technology. Correspondence to: Chenfeng Miao <miao chenfeng@126.com>.
Pseudocode	Yes	We show the implementation of each components in the following subsections and more details including the pseudocode in Appendix B.
Open Source Code	No	Audio samples of the proposed models are available at: https://mcf330.github.io/ Efficient TTSAudio Samples/. No explicit statement about releasing the source code for the methodology or a link to a code repository.
Open Datasets	Yes	We conduct most of our experiments on an open-source standard Mandarin dataset from Data Baker2, which consists of 10, 000 Chinese clips from a single female speaker with a sampling rate of 22.05k HZ. ... We also conduct some experiments using LJ-Speech dataset (Ito, 2017), which is a 24-hour waveform audio set of a single female speaker with 131,00 audio clips and a sample rate of 22.05k HZ. 2https://www.data-baker.com/open_source. html
Dataset Splits	No	The paper mentions using Data Baker and LJ-Speech datasets, but does not explicitly state the proportions or counts for training, validation, and test splits. It only gives the total size of the datasets (e.g., "10,000 Chinese clips", "24-hour waveform audio set").
Hardware Specification	Yes	We run training and inference on a single V100 GPU.
Software Dependencies	No	The paper mentions using "Hi Fi-GAN (Kong et al., 2020) vocoder", "open-source implementations of Tacotron 2", and "Glow-TTS" but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	The temperature of latent variable z is set to 0.667 for both Glow-TTS and EFTS-Flow. ... η is a hyper-parameter which we set to 1.2 for all experiments.