reproducibilityindex.ai

DASpeech: Directed Acyclic Transformer for Fast and High-quality Speech-to-Speech Translation

Authors: Qingkai Fang, Yan Zhou, Yang Feng

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on the CVSS Fr En benchmark demonstrate that DASpeech can achieve comparable or even better performance than the state-of-the-art S2ST model Translatotron 2, while preserving up to 18.53 speedup compared to the autoregressive baseline.
Researcher Affiliation	Academia	Qingkai Fang1,2, Yan Zhou1,2, Yang Feng1,2 1Key Laboratory of Intelligent Information Processing Institute of Computing Technology, Chinese Academy of Sciences (ICT/CAS) 2University of Chinese Academy of Sciences, Beijing, China {fangqingkai21b,zhouyan23z,fengyang}@ict.ac.cn
Pseudocode	No	The paper describes algorithms (e.g., Forward Algorithm, Backward Algorithm, Viterbi) using mathematical notation and descriptive text, but it does not provide any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code is publicly available at https://github.com/ictnlp/DASpeech.
Open Datasets	Yes	We conduct experiments on the CVSS dataset [4], a large-scale S2ST corpus containing speech-to-speech translation pairs from 21 languages to English.
Dataset Splits	Yes	For the weight of TTS loss µ, we experiment with µ {1.0, 2.0, 5.0, 10.0} and choose µ = 5.0 according to results on the dev set.
Hardware Specification	Yes	All models are trained on 4 RTX 3090 GPUs.
Software Dependencies	No	The paper mentions several software tools and libraries such as fairseq, ASR-BLEU toolkit, Sacre BLEU, Sentence Piece toolkit, Adam optimizer, and Hi Fi-GAN vocoder, but it does not specify their version numbers.
Experiment Setup	Yes	For model regularization, we set dropout to 0.1 and weight decay to 0.01, and no label smoothing is used. ... During finetuning, we train the entire model for 50k updates with a batch of 320k audio frames. The learning rate warms up to 1e-3 within 4k steps. We use Adam optimizer [23] for both pretraining and finetuning. For the weight of TTS loss µ, we experiment with µ {1.0, 2.0, 5.0, 10.0} and choose µ = 5.0 according to results on the dev set.