FedSpeech: Federated Text-to-Speech with Continual Learning

Authors: Ziyue Jiang, Yi Ren, Ming Lei, Zhou Zhao

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on a reduced VCTK dataset to evaluate Fed Speech.
Researcher Affiliation Collaboration Ziyue Jiang1 , Yi Ren1 , Ming Lei2 and Zhou Zhao1 1Zhejiang University 2Alibaba Group ziyuejiang341@gmail.com, rayeren@zju.edu.cn, lm86501@alibaba-inc.com, zhaozhou@zju.edu.cn
Pseudocode Yes Algorithm 1 Two Rounds of Training with Fed Speech
Open Source Code No Synthesized speech samples can be found in: https://fedspeech.github.io/Fed Speech example/ . The paper provides a link to example audio files but does not explicitly provide access to the source code for the methodology.
Open Datasets Yes We conduct experiments on the VCTK dataset [Veaux et al., 2017]
Dataset Splits Yes To simulate the low-resource language scenarios, we randomly select and split the samples from each speaker into 3 sets: 100 samples for training, 20 samples for validation, and 20 samples for testing.
Hardware Specification Yes We use 1 Nvidia 1080 Ti GPU, with 11GB memory.
Software Dependencies No The paper mentions various software components and models like Fast Speech 2, Parallel Wave GAN, and a grapheme-to-phoneme conversion tool, but it does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes Each batch contains about 20,000 mel-spectogram frames. We use Adam optimizer with β1 = 0.9, β2 = 0.98, ϵ = 10 9 and follow the learning rate schedule in [Vaswani et al., 2017]. In all experiments, we choose 10 speakers. For each speaker, it takes 4k steps for Fed Speech model training (including the gradual pruning masks training) and 1k steps for selective masks training.