Semi-Supervised Generative Modeling for Controllable Speech Synthesis
Authors: Raza Habib, Soroosh Mariooryad, Matt Shannon, Eric Battenberg, RJ Skerry-Ryan, Daisy Stanton, David Kao, Tom Bagby
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate the efficacy of semi-supervised latent variable models for controllable TTS we trained the model described in section 2 on the above data-sets at varying levels of supervision as well as for varying settings of the hyperparameters: α which controls the supervision loss and γ, which over emphasizes supervised training points. |
| Researcher Affiliation | Collaboration | Raza Habib1 Soroosh Mariooryad2 Matt Shannon2 Eric Battenberg2 RJ Skerry-Ryan2 Daisy Stanton2 David Kao2 Tom Bagby2 1University College London (UCL) 2Google Research. |
| Pseudocode | No | The paper describes the model architecture and training procedure in text and figures but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper provides links to audio samples and a demo page, but does not provide a specific link or statement for the release of source code for the methodology described. |
| Open Datasets | Yes | To verify the reproduciblity of our results on a public dataset, we trained models to control speaking rate and F0 variation on clean subset of Libri TTS dataset (Zen et al., 2019). |
| Dataset Splits | Yes | The training set consists of 72,405 utterances with durations of at most 5 seconds (45 hours). The validation and test sets each contain 745 utterances or roughly 30 minutes of data. |
| Hardware Specification | Yes | All models were trained using the ADAM optimizer with learning rate of 10 3 and run for 300, 000 training steps with a batch size of 256, distributed across 32 Google Cloud TPU chips. |
| Software Dependencies | No | All models were implemented using tensorflow 1 (Abadi et al., 2016). The paper mentions TensorFlow 1 but does not provide a specific version number for TensorFlow or any other software libraries used. |
| Experiment Setup | Yes | All models were trained using the ADAM optimizer with learning rate of 10 3 and run for 300, 000 training steps with a batch size of 256, distributed across 32 Google Cloud TPU chips. |