Translatotron 2: High-quality direct speech-to-speech translation with voice preservation

Authors: Ye Jia, Michelle Tadmor Ramanovich, Tal Remez, Roi Pomerantz

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on three datasets consistently show that Translatotron 2 outperforms the original Translatotron by a large margin on both translation quality (up to +15.5 BLEU) and speech generation quality, and approaches the same of cascade systems.
Researcher Affiliation Industry 1Google Research. Correspondence to: Ye Jia <jiaye@google.com>.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Audio samples from Translatotron 2 are available online.1 ... 1https://google-research.github.io/lingvo-lab/translatotron2/ (Navigating to this link reveals: "Audio samples and source code are available on our GitHub repository." with a link to https://github.com/google/lingvo/tree/master/lingvo/tasks/s2st/translatotron2)
Open Datasets Yes We conducted experiments on three datasets, including two Spanish English datasets and a multilingual English dataset. ... Table 1: Datasets for experiments with translation speech in a single-speaker s voice. Conversational (Jia et al., 2019a) Fisher Es-En (Post et al., 2013) Co Vo ST 2 (Wang et al., 2021a) ... The original Common Voice (Ardila et al., 2020) data split instead of the Co Vo ST 2 data split was followed.
Dataset Splits Yes The checkpoints for evaluation were picked by the best average BLEU on 4 language pairs on the validation set. ... The original Common Voice (Ardila et al., 2020) data split instead of the Co Vo ST 2 data split was followed.
Hardware Specification No The paper does not provide specific details about the hardware used, such as CPU or GPU models, or cloud computing instance types.
Software Dependencies No All models were implemented using the Lingvo framework (Shen et al., 2019). The paper mentions the framework but does not specify its version or any other software dependencies with version numbers.
Experiment Setup Yes A. Table of hyper-parameters. Table 7: Model hyper-parameters used in the experiments. (Details like Sample rate, Mel channels, Frame size, Spec Augment parameters, Conformer dims, Attention heads, LSTM dims, learning rate, batch size, etc., are provided).