reproducibilityindex.ai

Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis

Authors: Ziyue Jiang, Jinglin Liu, Yi Ren, Jinzheng He, Zhenhui Ye, Shengpeng Ji, Qian Yang, Chen Zhang, Pengfei Wei, Chunfeng Wang, Xiang Yin, Zejun MA, Zhou Zhao

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that Mega-TTS 2 could not only synthesize identity-preserving speech with a short prompt of an unseen speaker from arbitrary sources but consistently outperform the fine-tuning method when the volume of data ranges from 10 seconds to 5 minutes.
Researcher Affiliation	Collaboration	Zhejiang University & Byte Dance {ziyuejiang,zhaozhou}@zju.edu.cn, {liu.jinglin,ren.yi,yinxiang.stephen}@bytedance.com
Pseudocode	No	The paper provides architectural descriptions and procedural steps, but no formal pseudocode or algorithm blocks are included.
Open Source Code	No	Audio samples can be found in https://boostprompt.github.io/boostprompt/. (This links to samples, not code). No other specific code release statement found.
Open Datasets	Yes	We train Mega-TTS 2 and all baselines on Libri Light (Kahn et al., 2020), which contains 60K hours of unlabelled speech derived from Libri Vox audiobooks.
Dataset Splits	Yes	We randomly choose 20 speakers from the Libri Speech test-clean set and randomly choose 400 seconds of speeches for each of them. We split the 400 seconds of speech into a 300-second prompt set and a 100-second target set.
Hardware Specification	Yes	In the first training stage, we train the first-stage model on 4 NVIDIA A100 GPUs, with a batch size of 48 sentences on each GPU. In the second stage, we train the P-LLM and duration model on 8 NVIDIA A100 GPUs, with a batch size of 4,000 tokens on each GPU.
Software Dependencies	No	The paper mentions using 'Adam optimizer' and 'Hi Fi-GAN V1' but does not provide specific version numbers for these or other software dependencies, nor does it specify the programming language or framework versions.
Experiment Setup	Yes	We provide model configuration in Appendix A.4 and detailed hyperparameter settings in Table 5.