UWSpeech: Speech to Speech Translation for Unwritten Languages
Authors: Chen Zhang, Xu Tan, Yi Ren, Tao Qin, Kejun Zhang, Tie-Yan Liu14319-14327
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on Fisher Spanish-English conversation translation dataset show that UWSpeech outperforms direct translation and VQ-VAE baseline by about 16 and 10 BLEU points respectively, which demonstrate the advantages and potentials of UWSpeech. |
| Researcher Affiliation | Collaboration | 1Zhejiang University, China 2Microsoft Research Asia |
| Pseudocode | Yes | Algorithm 1 UWSpeech Training and Inference |
| Open Source Code | Yes | Speech samples and experimental details can be found in https://speechresearch.github.io/uwspeech/ |
| Open Datasets | Yes | We choose Fisher Spanish-English dataset (Post et al. 2013) for translation. ... Both the German and French datasets are from Common Voice3... For the Chinese dataset, we use AIShell (Bu et al. 2017) |
| Dataset Splits | No | We choose the λ in Equation 6 according to the validation performance and set λ to 0.01. The batch size is set to 25K frames for each GPU and the XL-VAE training takes 200K steps on 4 Tesla V100 GPUs. After the training of XL-VAE, the phoneme error rates (PER) of three written languages (German, French and Chinese) on the development set are 16%, 21% and 12% respectively. |
| Hardware Specification | Yes | The batch size is set to 25K frames for each GPU and the XL-VAE training takes 200K steps on 4 Tesla V100 GPUs. |
| Software Dependencies | No | Our code is implemented based on tensor2tensor library (Vaswani et al. 2018)4. |
| Experiment Setup | Yes | We choose the λ in Equation 6 according to the validation performance and set λ to 0.01. The batch size is set to 25K frames for each GPU and the XL-VAE training takes 200K steps on 4 Tesla V100 GPUs. ... We set beam size to 4 and the length penalty to 1.0. |