reproducibilityindex.ai

Semi-Supervised Generative Modeling for Controllable Speech Synthesis

Authors: Raza Habib, Soroosh Mariooryad, Matt Shannon, Eric Battenberg, RJ Skerry-Ryan, Daisy Stanton, David Kao, Tom Bagby

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To evaluate the efﬁcacy of semi-supervised latent variable models for controllable TTS we trained the model described in section 2 on the above data-sets at varying levels of supervision as well as for varying settings of the hyperparameters: α which controls the supervision loss and γ, which over emphasizes supervised training points.
Researcher Affiliation	Collaboration	Raza Habib1 Soroosh Mariooryad2 Matt Shannon2 Eric Battenberg2 RJ Skerry-Ryan2 Daisy Stanton2 David Kao2 Tom Bagby2 1University College London (UCL) 2Google Research.
Pseudocode	No	The paper describes the model architecture and training procedure in text and figures but does not include explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper provides links to audio samples and a demo page, but does not provide a specific link or statement for the release of source code for the methodology described.
Open Datasets	Yes	To verify the reproduciblity of our results on a public dataset, we trained models to control speaking rate and F0 variation on clean subset of Libri TTS dataset (Zen et al., 2019).
Dataset Splits	Yes	The training set consists of 72,405 utterances with durations of at most 5 seconds (45 hours). The validation and test sets each contain 745 utterances or roughly 30 minutes of data.
Hardware Specification	Yes	All models were trained using the ADAM optimizer with learning rate of 10 3 and run for 300, 000 training steps with a batch size of 256, distributed across 32 Google Cloud TPU chips.
Software Dependencies	No	All models were implemented using tensorﬂow 1 (Abadi et al., 2016). The paper mentions TensorFlow 1 but does not provide a specific version number for TensorFlow or any other software libraries used.
Experiment Setup	Yes	All models were trained using the ADAM optimizer with learning rate of 10 3 and run for 300, 000 training steps with a batch size of 256, distributed across 32 Google Cloud TPU chips.