Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Lexically Constrained Neural Machine Translation with Explicit Alignment Guidance

Authors: Guanhua Chen, Yun Chen, Victor O.K. Li12630-12638

AAAI 2021 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on WMT16 De-En and WMT16 Ro En show the effectiveness of our approaches on constrained NMT. In particular, the proposed EAM-OUTPUT method consistently outperforms previous approaches in translation quality, with light computational overheads over unconstrained baseline.
Researcher Affiliation	Academia	1 The University of Hong Kong 2 Shanghai University of Finance and Economics
Pseudocode	No	The paper describes the decoding process and methods verbally and with a diagram (Figure 1), but does not provide pseudocode or a clearly labeled algorithm block.
Open Source Code	Yes	Code is public at https://github.com/ghchen18/cdalign
Open Datasets	Yes	Models are trained on WMT16 De-En and WMT16 Ro-En training set and evaluated on alignment testset and WMT news translation testset. ... For the alignment testset, we use the handaligned, publicly available alignment testset for De-En4 and Ro-En5. 4https://www-i6.informatik.rwth-aachen.de/goldAlignment 5http://web.eecs.umich.edu/ mihalcea/wpt/index.html#resources
Dataset Splits	Yes	We use newstest2013 and newsdev2016 as development sets for De-En and Ro En respectively.
Hardware Specification	Yes	The decoding speed is tested on a single Ge Force RTX 2080Ti GPU.
Software Dependencies	Yes	Model is implemented on fairseq toolkit6 (Ott et al. 2019). ... We report case-sensitive BLEU score using sacre BLEU7 (Post 2018). BLEU+case.mixed+numrefs.1+smooth.exp+tok.13a +version.1.4.3
Experiment Setup	Yes	The learning rate is 0.0005 and warmup step is 4000. All the dropout probabilities are set to 0.3. The batch size is 32k tokens. Maximum updates number is 100k for the De-En language pair and 50k for the Ro-En language pair. For training the EAM, the maximum updates number is 10k.