reproducibilityindex.ai

P-Flow: A Fast and Data-Efficient Zero-Shot TTS through Speech Prompting

Authors: Sungwon Kim, Kevin Shih, rohan badlani, Joao Felipe Santos, Evelina Bakhturina, Mikyas Desta, Rafael Valle, Sungroh Yoon, Bryan Catanzaro

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 Experiments: Training and Inference Settings: P-Flow is trained on a single NVIDIA A100 GPU for 800K iterations, using a batch size of 64. We utilize the Adam W optimizer [26] with a learning rate of 0.0001.
Researcher Affiliation	Collaboration	Sungwon Kim1,2 , Kevin J Shih1, Rohan Badlani1, Jo ao Felipe Santos1, Evelina Bhakturina1, Mikyas Desta1, Rafael Valle1 , Sungroh Yoon2,3 , Bryan Catanzaro1... 1Work done as a research intern at NVIDIA. Corresponding authors: Sungwon Kim: ksw0306@snu.ac.kr, Rafael Valle: rafaelvalle@nvidia.com, Sungroh Yoon: sryoon@snu.ac.kr... 2Department of Electrical and Computer Engineering, Seoul National University 3Interdisciplinary Program in Artificial Intelligence, Seoul National University
Pseudocode	No	The paper describes algorithms and processes in text and diagrams but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	We provide audio samples on our demo page.6... 6 Demo: https://research.nvidia.com/labs/adlr/projects/pflow
Open Datasets	Yes	Data: We train P-Flow on Libri TTS [41]. Libri TTS training set consists of 580 hours of data from 2,456 speakers. We specifically use data that is longer than 3 seconds for speech prompting, yielding a 256 hours subset.
Dataset Splits	No	The paper mentions training on Libri TTS and evaluating on Libri Speech test-clean but does not explicitly provide details for a validation dataset split.
Hardware Specification	Yes	P-Flow is trained on a single NVIDIA A100 GPU for 800K iterations, using a batch size of 64.
Software Dependencies	No	The paper mentions various software components such as PyTorch transformer, AdamW optimizer, G2P model, Hifi-GAN, HuBERT ASR model, and WavLM-TDNN, but does not provide specific version numbers for any of them.
Experiment Setup	Yes	Training and Inference Settings: P-Flow is trained on a single NVIDIA A100 GPU for 800K iterations, using a batch size of 64. We utilize the Adam W optimizer [26] with a learning rate of 0.0001. ... Table 9: Hyperparameters of P-Flow