Revisiting End-to-End Speech-to-Text Translation From Scratch
Authors: Biao Zhang, Barry Haddow, Rico Sennrich
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On four benchmarks covering 23 languages, our experiments show that, without using any transcripts or pretraining, the proposed system reaches and even outperforms previous studies adopting pretraining, although the gap remains in (extremely) low-resource settings. Experimental results show that the significance of pretraining has been over-estimated in prior work, and integrating techniques to improve E2E ST from scratch is feasible and promising. |
| Researcher Affiliation | Academia | 1School of Informatics, University of Edinburgh 2Department of Computational Linguistics, University of Zurich. |
| Pseudocode | No | The paper describes the methods and algorithms in natural language and mathematical equations, but it does not include any formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | Source code is available at https://github.com/ bzhang Go/zero. |
| Open Datasets | Yes | We work on four benchmarks covering different domains and 23 languages from diverse language families. Mu ST-C ... (Di Gangi et al., 2019), ... Libri Speech En-Fr ... (Kocabiyikoglu et al., 2018). ... Kosp2e Ko-En ... (Cho et al., 2021). ... Co Vo ST ... (Ardila et al., 2020). |
| Dataset Splits | Yes | For each benchmark, we use the official train/dev/test split for experiments. |
| Hardware Specification | No | The paper does not specify any particular hardware used for running the experiments (e.g., GPU models, CPU types, or cloud computing instance details). |
| Software Dependencies | No | The paper mentions software tools like "Adam" (Kingma & Ba, 2015), "Moses" (Koehn et al., 2007), and "Sacre BLEU" (Post, 2018), but it does not provide specific version numbers for any of these software dependencies. |
| Experiment Setup | Yes | We employ Adam (Kingma & Ba, 2015, β1 = 0.9, β2 = 0.98) for parameter update using adaptive learning rate schedule as in (Vaswani et al., 2017) with a warmup step of 4K and label smoothing of 0.1. Dropout of rate 0.2 is applied to residual connections and Re LU activations. We organize training samples of around 20K target subwords into one batch, and train models up to 50K steps. |