reproducibilityindex.ai

A Spectral Energy Distance for Parallel Speech Synthesis

Authors: Alexey Gritsenko, Tim Salimans, Rianne van den Berg, Jasper Snoek, Nal Kalchbrenner

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we achieve state-of-the-art generation quality among implicit generative models, as judged by the recently-proposed c FDSD metric. When combining our method with adversarial techniques, we also improve upon the recently-proposed GAN-TTS model in terms of Mean Opinion Score as judged by trained human evaluators. We evaluate our proposed approach to speech synthesis by training 4 different models on the data set described in the previous section:
Researcher Affiliation	Industry	Alexey A. Gritsenko Tim Salimans Rianne van den Berg Jasper Snoek Nal Kalchbrenner {agritsenko,salimans,riannevdberg,jsnoek,nalk}@google.com Google Research
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	An open source implementation of our generalized energy distance is available at https://github.com/google-research/google-research/ tree/master/ged_tts.
Open Datasets	No	Our TTS models are trained on a single-speaker North American English dataset, consisting of speech data spoken by a professional voice actor. The data consists of approximately sixty thousand utterances with durations ranging approximately from 0.5 seconds to 1 minute, and their corresponding aligned linguistic features and pitch information. The paper does not provide a specific name, URL, DOI, or formal citation (with authors and year for this dataset) for public access.
Dataset Splits	Yes	We report these metrics on both the training data as well as a larger validation set. We veriﬁed that this improvement is not due to overﬁtting by re-training the models on a smaller dataset (38.6 hours) and re-computing these metrics on a larger validation set (5.8 hours).
Hardware Specification	Yes	All of our models are trained on Cloud TPUs v3 with hyperparameters as described in Table 5 of the Appendix.
Software Dependencies	No	The paper mentions 'Our model parameters are updated using Adam [17]' but does not provide specific version numbers for software libraries, frameworks (e.g., TensorFlow, PyTorch), or programming languages used for implementation.
Experiment Setup	Yes	All of our models are trained on Cloud TPUs v3 with hyperparameters as described in Table 5 of the Appendix. Our model parameters are updated using Adam [17].