reproducibilityindex.ai

Lagrangian Method for Q-Function Learning (with Applications to Machine Translation)

Authors: Huang Bojun

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	As a demonstration of this potential, we developed a simple imitation learning algorithm based on the Lagrangian duality theory. We presented two practical variants of the algorithm, and empirically applied them to Machine Translation (MT) as a case study. Our algorithms are able to train Transformer (Vaswani et al., 2017), a state-of-the-art neural network model, on large-scale MT benchmarks with real-world data, and lead to 1.4 BLEU (=5%) improvement over standard baseline (Section 4).
Researcher Affiliation	Industry	1Rakuten Institute of Technology, Rakuten Group Inc., Japan. Correspondence to: Huang Bojun <bojhuang@gmail.com>.
Pseudocode	Yes	Algorithm 1 The LAMIN1 algorithm. Algorithm 2 The LAMIN2 algorithm.
Open Source Code	Yes	The experimentation code in Supplementary Material contains a faithful implementation in Python of the formulation presented here.
Open Datasets	Yes	We tested our algorithms using the WMT 2014 English German dataset.
Dataset Splits	No	The paper mentions a "held-out test set" but does not explicitly provide percentages or counts for training, validation, and test splits, nor does it explicitly mention a validation set. It only states, "The training data consists of 4.5 million translation episodes."
Hardware Specification	Yes	We trained the model on the same 4.5 millions sentence pairs in the WMT 14 data set for 100, 000 gradient updates on a V100 GPU
Software Dependencies	No	The paper mentions "Sacre BLEU (Post, 2018)" and "You Token To Me" as tools used, and that gradient computation "can conveniently run on automatic differentiation libraries such as PyTorch", but it does not provide specific version numbers for these software dependencies to ensure reproducibility.
Experiment Setup	Yes	We trained the Transformer model using LAMIN1 and LAMIN2, with varying temperature β, then tested the Q-greedy policy with the standard BLEU metric. We trained the model on the same 4.5 millions sentence pairs in the WMT 14 data set for 100, 000 gradient updates on a V100 GPU, with the same mini-batch size (and token-padding strategy) and learning rate schedule as recommended by Vaswani et al. (2017). A dropout rate of 0.1 is applied.