UWSpeech: Speech to Speech Translation for Unwritten Languages

Authors: Chen Zhang, Xu Tan, Yi Ren, Tao Qin, Kejun Zhang, Tie-Yan Liu14319-14327

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on Fisher Spanish-English conversation translation dataset show that UWSpeech outperforms direct translation and VQ-VAE baseline by about 16 and 10 BLEU points respectively, which demonstrate the advantages and potentials of UWSpeech.
Researcher Affiliation Collaboration 1Zhejiang University, China 2Microsoft Research Asia
Pseudocode Yes Algorithm 1 UWSpeech Training and Inference
Open Source Code Yes Speech samples and experimental details can be found in https://speechresearch.github.io/uwspeech/
Open Datasets Yes We choose Fisher Spanish-English dataset (Post et al. 2013) for translation. ... Both the German and French datasets are from Common Voice3... For the Chinese dataset, we use AIShell (Bu et al. 2017)
Dataset Splits No We choose the λ in Equation 6 according to the validation performance and set λ to 0.01. The batch size is set to 25K frames for each GPU and the XL-VAE training takes 200K steps on 4 Tesla V100 GPUs. After the training of XL-VAE, the phoneme error rates (PER) of three written languages (German, French and Chinese) on the development set are 16%, 21% and 12% respectively.
Hardware Specification Yes The batch size is set to 25K frames for each GPU and the XL-VAE training takes 200K steps on 4 Tesla V100 GPUs.
Software Dependencies No Our code is implemented based on tensor2tensor library (Vaswani et al. 2018)4.
Experiment Setup Yes We choose the λ in Equation 6 according to the validation performance and set λ to 0.01. The batch size is set to 25K frames for each GPU and the XL-VAE training takes 200K steps on 4 Tesla V100 GPUs. ... We set beam size to 4 and the length penalty to 1.0.