Revisiting End-to-End Speech-to-Text Translation From Scratch

Authors: Biao Zhang, Barry Haddow, Rico Sennrich

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On four benchmarks covering 23 languages, our experiments show that, without using any transcripts or pretraining, the proposed system reaches and even outperforms previous studies adopting pretraining, although the gap remains in (extremely) low-resource settings. Experimental results show that the significance of pretraining has been over-estimated in prior work, and integrating techniques to improve E2E ST from scratch is feasible and promising.
Researcher Affiliation Academia 1School of Informatics, University of Edinburgh 2Department of Computational Linguistics, University of Zurich.
Pseudocode No The paper describes the methods and algorithms in natural language and mathematical equations, but it does not include any formal pseudocode or algorithm blocks.
Open Source Code Yes Source code is available at https://github.com/ bzhang Go/zero.
Open Datasets Yes We work on four benchmarks covering different domains and 23 languages from diverse language families. Mu ST-C ... (Di Gangi et al., 2019), ... Libri Speech En-Fr ... (Kocabiyikoglu et al., 2018). ... Kosp2e Ko-En ... (Cho et al., 2021). ... Co Vo ST ... (Ardila et al., 2020).
Dataset Splits Yes For each benchmark, we use the official train/dev/test split for experiments.
Hardware Specification No The paper does not specify any particular hardware used for running the experiments (e.g., GPU models, CPU types, or cloud computing instance details).
Software Dependencies No The paper mentions software tools like "Adam" (Kingma & Ba, 2015), "Moses" (Koehn et al., 2007), and "Sacre BLEU" (Post, 2018), but it does not provide specific version numbers for any of these software dependencies.
Experiment Setup Yes We employ Adam (Kingma & Ba, 2015, β1 = 0.9, β2 = 0.98) for parameter update using adaptive learning rate schedule as in (Vaswani et al., 2017) with a warmup step of 4K and label smoothing of 0.1. Dropout of rate 0.2 is applied to residual connections and Re LU activations. We organize training samples of around 20K target subwords into one batch, and train models up to 50K steps.