reproducibilityindex.ai

UWSpeech: Speech to Speech Translation for Unwritten Languages

Authors: Chen Zhang, Xu Tan, Yi Ren, Tao Qin, Kejun Zhang, Tie-Yan Liu14319-14327

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on Fisher Spanish-English conversation translation dataset show that UWSpeech outperforms direct translation and VQ-VAE baseline by about 16 and 10 BLEU points respectively, which demonstrate the advantages and potentials of UWSpeech.
Researcher Affiliation	Collaboration	1Zhejiang University, China 2Microsoft Research Asia
Pseudocode	Yes	Algorithm 1 UWSpeech Training and Inference
Open Source Code	Yes	Speech samples and experimental details can be found in https://speechresearch.github.io/uwspeech/
Open Datasets	Yes	We choose Fisher Spanish-English dataset (Post et al. 2013) for translation. ... Both the German and French datasets are from Common Voice3... For the Chinese dataset, we use AIShell (Bu et al. 2017)
Dataset Splits	No	We choose the λ in Equation 6 according to the validation performance and set λ to 0.01. The batch size is set to 25K frames for each GPU and the XL-VAE training takes 200K steps on 4 Tesla V100 GPUs. After the training of XL-VAE, the phoneme error rates (PER) of three written languages (German, French and Chinese) on the development set are 16%, 21% and 12% respectively.
Hardware Specification	Yes	The batch size is set to 25K frames for each GPU and the XL-VAE training takes 200K steps on 4 Tesla V100 GPUs.
Software Dependencies	No	Our code is implemented based on tensor2tensor library (Vaswani et al. 2018)4.
Experiment Setup	Yes	We choose the λ in Equation 6 according to the validation performance and set λ to 0.01. The batch size is set to 25K frames for each GPU and the XL-VAE training takes 200K steps on 4 Tesla V100 GPUs. ... We set beam size to 4 and the length penalty to 1.0.