Global Prosody Style Transfer Without Text Transcriptions
Authors: Kaizhi Qian, Yang Zhang, Shiyu Chang, Jinjun Xiong, Chuang Gan, David Cox, Mark Hasegawa-Johnson
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on different style transfer tasks show that AUTOPST can effectively convert prosody that correctly reflects the styles of the target domains. |
| Researcher Affiliation | Collaboration | 1MIT-IBM Watson AI Lab, USA 2IBM Thomas J. Watson AI Lab, USA 3University of Illinois at Urbana-Champaign, USA. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | We encourage readers to listen to our online demo audios1. 1https://auspicious3000.github.io/AutoPST-Demo |
| Open Datasets | Yes | Our dataset is VCTK (Veaux et al., 2016), which consists of 44 hours of speech from 109 speakers. |
| Dataset Splits | No | We use 24 speakers for training and follow the same train/test partition as in (Qian et al., 2020b). |
| Hardware Specification | No | The paper does not provide specific hardware details used for running its experiments. |
| Software Dependencies | No | The paper mentions that the decoder is a Transformer and a WaveNet vocoder is used, but does not specify software library names with version numbers for reproducibility (e.g., PyTorch, TensorFlow, CUDA). |
| Experiment Setup | Yes | More hyperparameters setting details can be found in Appendix C. |