Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron
Authors: RJ Skerry-Ryan, Eric Battenberg, Ying Xiao, Yuxuan Wang, Daisy Stanton, Joel Shor, Ron Weiss, Rob Clark, Rif A. Saurous
ICML 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We define several quantitative and subjective metrics for evaluating prosody transfer, and report results with accompanying audio samples from single-speaker and 44-speaker Tacotron models on a prosody transfer task. |
| Researcher Affiliation | Industry | 1Google, Inc.. Correspondence to: RJ Skerry-Ryan <EMAIL>. |
| Pseudocode | No | The paper describes the model architecture and components using text and diagrams (Figure 1, Figure 2, Figure 3), but it does not include any formal pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states: "Sound demos are available at https://google.github.io/tacotron/publications/end_to_end_prosody_transfer." This link points to a demo page with audio samples, not to the source code for the methodology itself. |
| Open Datasets | Yes | Single-speaker dataset: A single speaker high-quality English dataset of audiobook recordings by Catherine Byers (the speaker from the 2013 Blizzard Challenge). |
| Dataset Splits | No | The paper provides details on training steps and learning rate decay, but it does not specify any training/validation/test dataset splits or validation set sizes for the main experiments. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions several software components like "Tacotron" and "Adam optimizer" but does not specify their version numbers (e.g., TensorFlow version, Python version, specific library versions). |
| Experiment Setup | Yes | We train our models for at least 200k steps with a mini-batch size of 256 using the Adam optimizer (Kingma & Ba, 2015). We start with a learning rate of 1 10 3 and decay it to 5 10 4, 3 10 4, 1 10 4, and 5 10 5 at step 50k, 100k, 150k, and 200k respectively. |