Non-Autoregressive Neural Text-to-Speech

Authors: Kainan Peng, Wei Ping, Zhao Song, Kexin Zhao

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present several experiments to evaluate the proposed Para Net and Wave VAE. ... We report the MOS results in Table 2. ... We test synthesis speed of all models on NVIDIA Ge Force GTX 1080 Ti with 32-bit floating point (FP32) arithmetic.
Researcher Affiliation Industry Kainan Peng 1 Wei Ping 1 Zhao Song 1 Kexin Zhao 1 ... 1Baidu Research, 1195 Bordeaux Dr, Sunnyvale, CA. ... Correspondence to: Wei Ping <weiping.thu@gmail.com>.
Pseudocode No The paper describes the model architectures and algorithms in textual form and through figures, but does not include any explicit pseudocode blocks or algorithms.
Open Source Code No Speech samples can be found in: https:// parallel-neural-tts-demo.github.io/. We use an open source reimplementation of Fast Speech 1 by adapting the hyperparameters for handling the 24k Hz dataset.1https://github.com/xcmyz/Fast Speech. The paper links to a demo page for speech samples and references a third-party open-source implementation of Fast Speech, but does not provide access to the source code for their own proposed Para Net or Wave VAE models.
Open Datasets No In our experiment, we use an internal English speech dataset containing about 20 hours of speech data from a female speaker with a sampling rate of 48 k Hz.
Dataset Splits No The paper mentions training models for a certain number of steps and using audio clips, but does not explicitly provide details about training/validation/test dataset splits (e.g., percentages or sample counts for each split).
Hardware Specification Yes We test synthesis speed of all models on NVIDIA Ge Force GTX 1080 Ti with 32-bit floating point (FP32) arithmetic. ... We train all neural vocoders on 8 Nvidia 1080Ti GPUs
Software Dependencies No The paper mentions using Adam optimizer and certain architectures (e.g., Wave Net, Clari Net, Wave Glow), but does not specify software dependencies with version numbers (e.g., Python, TensorFlow/PyTorch versions).
Experiment Setup Yes Table 1. Hyperparameters of autoregressive text-to-spectrogram model and non-autoregressive Para Net in the experiment.