FedSpeech: Federated Text-to-Speech with Continual Learning
Authors: Ziyue Jiang, Yi Ren, Ming Lei, Zhou Zhao
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on a reduced VCTK dataset to evaluate Fed Speech. |
| Researcher Affiliation | Collaboration | Ziyue Jiang1 , Yi Ren1 , Ming Lei2 and Zhou Zhao1 1Zhejiang University 2Alibaba Group ziyuejiang341@gmail.com, rayeren@zju.edu.cn, lm86501@alibaba-inc.com, zhaozhou@zju.edu.cn |
| Pseudocode | Yes | Algorithm 1 Two Rounds of Training with Fed Speech |
| Open Source Code | No | Synthesized speech samples can be found in: https://fedspeech.github.io/Fed Speech example/ . The paper provides a link to example audio files but does not explicitly provide access to the source code for the methodology. |
| Open Datasets | Yes | We conduct experiments on the VCTK dataset [Veaux et al., 2017] |
| Dataset Splits | Yes | To simulate the low-resource language scenarios, we randomly select and split the samples from each speaker into 3 sets: 100 samples for training, 20 samples for validation, and 20 samples for testing. |
| Hardware Specification | Yes | We use 1 Nvidia 1080 Ti GPU, with 11GB memory. |
| Software Dependencies | No | The paper mentions various software components and models like Fast Speech 2, Parallel Wave GAN, and a grapheme-to-phoneme conversion tool, but it does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Each batch contains about 20,000 mel-spectogram frames. We use Adam optimizer with β1 = 0.9, β2 = 0.98, ϵ = 10 9 and follow the learning rate schedule in [Vaswani et al., 2017]. In all experiments, we choose 10 speakers. For each speaker, it takes 4k steps for Fed Speech model training (including the gradual pruning masks training) and 1k steps for selective masks training. |