reproducibilityindex.ai

Fuzzy Alignments in Directed Acyclic Graph for Non-Autoregressive Machine Translation

Authors: Zhengrui Ma, Chenze Shao, Shangtong Gui, Min Zhang, Yang Feng

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on major WMT benchmarks show that our method substantially improves translation performance and increases prediction confidence, setting a new state of the art for NAT on the raw training data.
Researcher Affiliation	Academia	Zhengrui Ma1,2, Chenze Shao1,2, Shangtong Gui1,2, Min Zhang3 & Yang Feng1,2 1 Key Laboratory of Intelligent Information Processing Institute of Computing Technology, Chinese Academy of Sciences 2 University of Chinese Academy of Sciences 3 Harbin Institute of Technology, Shenzhen
Pseudocode	Yes	Algorithm 1 Calculation of Ey [Cg(y )] and Ey [P g Gn(y ) Cg(y )]
Open Source Code	Yes	1Source code: https://github.com/ictnlp/FA-DAT.
Open Datasets	Yes	Datasets We conduct experiments on two major benchmarks that are widely used in previous studies: WMT14 English German (EN DE, 4M) and WMT17 Chinese English (ZH EN, 20M). Newstest2013 as the validation set and newstest2014 as the test set for EN DE; devtest2017 as the validation set and newstest2017 as the test set for ZH EN.
Dataset Splits	Yes	Newstest2013 as the validation set and newstest2014 as the test set for EN DE; devtest2017 as the validation set and newstest2017 as the test set for ZH EN.
Hardware Specification	Yes	All the experiments are conducted on Ge Force RTX 3090 GPUs.
Software Dependencies	No	The paper mentions implementing models with 'open-source toolkit fairseq (Ott et al., 2019)' but does not provide specific version numbers for fairseq or other software dependencies.
Experiment Setup	Yes	During both pretraining and finetuning, we set dropout rate to 0.1, weight decay to 0.01, and no label smoothing is applied. In pretraining, all models are trained for 300k updates with a batch size of 64k tokens. The learning rate warms up to 5 * 10^-4 within 10k steps. In finetuning, we use the batch of 256k tokens to stabilize the gradients and train models for 5k updates. The learning rate warms up to 2 * 10^-4 within 500 steps.