Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
AutoVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss
Authors: Kaizhi Qian, Yang Zhang, Shiyu Chang, Xuesong Yang, Mark Hasegawa-Johnson
ICML 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we will evaluate AUTOVC on many-to-many voice conversion tasks, and empirically validate the assumptions of the AUTOVC framework. We performed two subjective tests on Amazon Mechanical Turk (MTurk). |
| Researcher Affiliation | Collaboration | 1University of Illinois at Urbana Champaign, IL, USA 2MIT-IBM Watson AI Lab, Cambridge, MA, USA 3IBM Research, Cambridge, MA, USA. |
| Pseudocode | No | The paper describes the architecture and process in text and diagrams but does not include any explicit pseudocode blocks or algorithm listings. |
| Open Source Code | No | The implementation will become publicly available. |
| Open Datasets | Yes | The evaluation is performed on the VCTK corpus (Veaux et al., 2016), which contains 44 hours of utterances from 109 speakers. In our implementation, the speaker encoder is pre-trained on the combination of Vox Celeb1 (Nagrani et al., 2017) and Librispeech (Panayotov et al., 2015) corpora, where there are a total of 3549 speakers. |
| Dataset Splits | No | The data of each speaker is then partitioned into training and test sets by 9:1. The paper mentions training and test sets but does not explicitly provide details for a validation split. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models, memory) used to run its experiments. |
| Software Dependencies | No | The paper mentions various software components and algorithms (e.g., ADAM optimizer, Wave Net vocoder, LSTM, PyTorch), but it does not specify their version numbers for reproducibility. |
| Experiment Setup | Yes | AUTOVC is trained with a batch size of two for 100k steps, using the ADAM optimizer. The speaker embedding is generated by feeding 10 two-second utterances of the same speaker to the speaker encoder and averaging the resulting embeddings. The weights in Eq. (12) are set to ฮป = 1, ยต = 1. |