Translatotron 2: High-quality direct speech-to-speech translation with voice preservation
Authors: Ye Jia, Michelle Tadmor Ramanovich, Tal Remez, Roi Pomerantz
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on three datasets consistently show that Translatotron 2 outperforms the original Translatotron by a large margin on both translation quality (up to +15.5 BLEU) and speech generation quality, and approaches the same of cascade systems. |
| Researcher Affiliation | Industry | 1Google Research. Correspondence to: Ye Jia <jiaye@google.com>. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Audio samples from Translatotron 2 are available online.1 ... 1https://google-research.github.io/lingvo-lab/translatotron2/ (Navigating to this link reveals: "Audio samples and source code are available on our GitHub repository." with a link to https://github.com/google/lingvo/tree/master/lingvo/tasks/s2st/translatotron2) |
| Open Datasets | Yes | We conducted experiments on three datasets, including two Spanish English datasets and a multilingual English dataset. ... Table 1: Datasets for experiments with translation speech in a single-speaker s voice. Conversational (Jia et al., 2019a) Fisher Es-En (Post et al., 2013) Co Vo ST 2 (Wang et al., 2021a) ... The original Common Voice (Ardila et al., 2020) data split instead of the Co Vo ST 2 data split was followed. |
| Dataset Splits | Yes | The checkpoints for evaluation were picked by the best average BLEU on 4 language pairs on the validation set. ... The original Common Voice (Ardila et al., 2020) data split instead of the Co Vo ST 2 data split was followed. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used, such as CPU or GPU models, or cloud computing instance types. |
| Software Dependencies | No | All models were implemented using the Lingvo framework (Shen et al., 2019). The paper mentions the framework but does not specify its version or any other software dependencies with version numbers. |
| Experiment Setup | Yes | A. Table of hyper-parameters. Table 7: Model hyper-parameters used in the experiments. (Details like Sample rate, Mel channels, Frame size, Spec Augment parameters, Conformer dims, Attention heads, LSTM dims, learning rate, batch size, etc., are provided). |