Lagrangian Method for Q-Function Learning (with Applications to Machine Translation)
Authors: Huang Bojun
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | As a demonstration of this potential, we developed a simple imitation learning algorithm based on the Lagrangian duality theory. We presented two practical variants of the algorithm, and empirically applied them to Machine Translation (MT) as a case study. Our algorithms are able to train Transformer (Vaswani et al., 2017), a state-of-the-art neural network model, on large-scale MT benchmarks with real-world data, and lead to 1.4 BLEU (=5%) improvement over standard baseline (Section 4). |
| Researcher Affiliation | Industry | 1Rakuten Institute of Technology, Rakuten Group Inc., Japan. Correspondence to: Huang Bojun <bojhuang@gmail.com>. |
| Pseudocode | Yes | Algorithm 1 The LAMIN1 algorithm. Algorithm 2 The LAMIN2 algorithm. |
| Open Source Code | Yes | The experimentation code in Supplementary Material contains a faithful implementation in Python of the formulation presented here. |
| Open Datasets | Yes | We tested our algorithms using the WMT 2014 English German dataset. |
| Dataset Splits | No | The paper mentions a "held-out test set" but does not explicitly provide percentages or counts for training, validation, and test splits, nor does it explicitly mention a validation set. It only states, "The training data consists of 4.5 million translation episodes." |
| Hardware Specification | Yes | We trained the model on the same 4.5 millions sentence pairs in the WMT 14 data set for 100, 000 gradient updates on a V100 GPU |
| Software Dependencies | No | The paper mentions "Sacre BLEU (Post, 2018)" and "You Token To Me" as tools used, and that gradient computation "can conveniently run on automatic differentiation libraries such as PyTorch", but it does not provide specific version numbers for these software dependencies to ensure reproducibility. |
| Experiment Setup | Yes | We trained the Transformer model using LAMIN1 and LAMIN2, with varying temperature β, then tested the Q-greedy policy with the standard BLEU metric. We trained the model on the same 4.5 millions sentence pairs in the WMT 14 data set for 100, 000 gradient updates on a V100 GPU, with the same mini-batch size (and token-padding strategy) and learning rate schedule as recommended by Vaswani et al. (2017). A dropout rate of 0.1 is applied. |