reproducibilityindex.ai

FedSpeech: Federated Text-to-Speech with Continual Learning

Authors: Ziyue Jiang, Yi Ren, Ming Lei, Zhou Zhao

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments on a reduced VCTK dataset to evaluate Fed Speech.
Researcher Affiliation	Collaboration	Ziyue Jiang1 , Yi Ren1 , Ming Lei2 and Zhou Zhao1 1Zhejiang University 2Alibaba Group ziyuejiang341@gmail.com, rayeren@zju.edu.cn, lm86501@alibaba-inc.com, zhaozhou@zju.edu.cn
Pseudocode	Yes	Algorithm 1 Two Rounds of Training with Fed Speech
Open Source Code	No	Synthesized speech samples can be found in: https://fedspeech.github.io/Fed Speech example/ . The paper provides a link to example audio files but does not explicitly provide access to the source code for the methodology.
Open Datasets	Yes	We conduct experiments on the VCTK dataset [Veaux et al., 2017]
Dataset Splits	Yes	To simulate the low-resource language scenarios, we randomly select and split the samples from each speaker into 3 sets: 100 samples for training, 20 samples for validation, and 20 samples for testing.
Hardware Specification	Yes	We use 1 Nvidia 1080 Ti GPU, with 11GB memory.
Software Dependencies	No	The paper mentions various software components and models like Fast Speech 2, Parallel Wave GAN, and a grapheme-to-phoneme conversion tool, but it does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	Each batch contains about 20,000 mel-spectogram frames. We use Adam optimizer with β1 = 0.9, β2 = 0.98, ϵ = 10 9 and follow the learning rate schedule in [Vaswani et al., 2017]. In all experiments, we choose 10 speakers. For each speaker, it takes 4k steps for Fed Speech model training (including the gradual pruning masks training) and 1k steps for selective masks training.