Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

UWSpeech: Speech to Speech Translation for Unwritten Languages

Authors: Chen Zhang, Xu Tan, Yi Ren, Tao Qin, Kejun Zhang, Tie-Yan Liu14319-14327

AAAI 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on Fisher Spanish-English conversation translation dataset show that UWSpeech outperforms direct translation and VQ-VAE baseline by about 16 and 10 BLEU points respectively, which demonstrate the advantages and potentials of UWSpeech.
Researcher Affiliation Collaboration 1Zhejiang University, China 2Microsoft Research Asia
Pseudocode Yes Algorithm 1 UWSpeech Training and Inference
Open Source Code Yes Speech samples and experimental details can be found in https://speechresearch.github.io/uwspeech/
Open Datasets Yes We choose Fisher Spanish-English dataset (Post et al. 2013) for translation. ... Both the German and French datasets are from Common Voice3... For the Chinese dataset, we use AIShell (Bu et al. 2017)
Dataset Splits No We choose the λ in Equation 6 according to the validation performance and set λ to 0.01. The batch size is set to 25K frames for each GPU and the XL-VAE training takes 200K steps on 4 Tesla V100 GPUs. After the training of XL-VAE, the phoneme error rates (PER) of three written languages (German, French and Chinese) on the development set are 16%, 21% and 12% respectively.
Hardware Specification Yes The batch size is set to 25K frames for each GPU and the XL-VAE training takes 200K steps on 4 Tesla V100 GPUs.
Software Dependencies No Our code is implemented based on tensor2tensor library (Vaswani et al. 2018)4.
Experiment Setup Yes We choose the λ in Equation 6 according to the validation performance and set λ to 0.01. The batch size is set to 25K frames for each GPU and the XL-VAE training takes 200K steps on 4 Tesla V100 GPUs. ... We set beam size to 4 and the length penalty to 1.0.