A Spectral Energy Distance for Parallel Speech Synthesis

Authors: Alexey Gritsenko, Tim Salimans, Rianne van den Berg, Jasper Snoek, Nal Kalchbrenner

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we achieve state-of-the-art generation quality among implicit generative models, as judged by the recently-proposed c FDSD metric. When combining our method with adversarial techniques, we also improve upon the recently-proposed GAN-TTS model in terms of Mean Opinion Score as judged by trained human evaluators. We evaluate our proposed approach to speech synthesis by training 4 different models on the data set described in the previous section:
Researcher Affiliation Industry Alexey A. Gritsenko Tim Salimans Rianne van den Berg Jasper Snoek Nal Kalchbrenner {agritsenko,salimans,riannevdberg,jsnoek,nalk}@google.com Google Research
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes An open source implementation of our generalized energy distance is available at https://github.com/google-research/google-research/ tree/master/ged_tts.
Open Datasets No Our TTS models are trained on a single-speaker North American English dataset, consisting of speech data spoken by a professional voice actor. The data consists of approximately sixty thousand utterances with durations ranging approximately from 0.5 seconds to 1 minute, and their corresponding aligned linguistic features and pitch information. The paper does not provide a specific name, URL, DOI, or formal citation (with authors and year for this dataset) for public access.
Dataset Splits Yes We report these metrics on both the training data as well as a larger validation set. We verified that this improvement is not due to overfitting by re-training the models on a smaller dataset (38.6 hours) and re-computing these metrics on a larger validation set (5.8 hours).
Hardware Specification Yes All of our models are trained on Cloud TPUs v3 with hyperparameters as described in Table 5 of the Appendix.
Software Dependencies No The paper mentions 'Our model parameters are updated using Adam [17]' but does not provide specific version numbers for software libraries, frameworks (e.g., TensorFlow, PyTorch), or programming languages used for implementation.
Experiment Setup Yes All of our models are trained on Cloud TPUs v3 with hyperparameters as described in Table 5 of the Appendix. Our model parameters are updated using Adam [17].