Multi-Agent Dual Learning
Authors: Yiren Wang, Yingce Xia, Tianyu He, Fei Tian, Tao Qin, ChengXiang Zhai, Tie-Yan Liu
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on neural machine translation and image translation tasks demonstrate the effectiveness of the new framework. |
| Researcher Affiliation | Collaboration | University of Illinois at Urbana-Champaign Microsoft Research University of Science and Technology of China |
| Pseudocode | Yes | Algorithm 1: Algorithm for multi-agent dual learning. |
| Open Source Code | No | The paper refers to using third-party open-source implementations (e.g., 'All the implementations are based on the official tensor2tensor release: https://github.com/ tensorflow/tensor2tensor.', 'We implement our method based on Cycle GAN8.', 'based on Fair Seq Toolkit9 '), but does not state that the authors themselves are releasing the specific code for their multi-agent dual learning method. |
| Open Datasets | Yes | For IWSLT 2014 En De translation, following Edunov et al. (2018b), we lowercase all the sentences, and split them into training/validation/test set with 153k/7k/7k sentences respectively. For WMT 2014 En De translation, we choose WMT 2014 training set and filter out 4.5M sentences pairs following Gehring et al. (2017) and Vaswani et al. (2017). For unsupervised En De, following Lample et al. (2018), we choose 50M monolingual English and German sentences. |
| Dataset Splits | Yes | For IWSLT 2014 En De translation, following Edunov et al. (2018b), we lowercase all the sentences, and split them into training/validation/test set with 153k/7k/7k sentences respectively. For WMT 2014 En De translation...We concatenate newstest 2012 and newstest 2013 as the validation set and use newstest 2014 as the test set. |
| Hardware Specification | Yes | For the three tasks, we use one, eight and four M40 GPUs to train those networks for three, five and six days respectively. We train the model on 8 M40 GPUs for 2 days with Py Torch implementation of our algorithm based on Fair Seq Toolkit9 . |
| Software Dependencies | No | The paper mentions software like 'Adam', 'Transformer', 'tensor2tensor', 'Py Torch implementation', 'Fair Seq Toolkit', 'Cycle GAN', and 'sacreBLEU', but does not provide specific version numbers for the core libraries used in their method. While a sacreBLEU version is mentioned in a footnote (version.1.2.11), this is for an evaluation tool and not a direct dependency of their methodology, and other key components lack versions. |
| Experiment Setup | Yes | For IWSLT En De, we use the transformer small configuration with 4 and 8 blocks...word embedding dimension, hidden state dimension and non-linear layer dimension is set to be 256, 256 and 1024 respectively. For the WMT task...transformer big setting with 6 blocks...dimensions are 1024, 1024 and 4096 respectively. For the unsupervised NMT task...transformer base setting...dimensions are 512, 512 and 2048. The dropout rates for the three settings are 0.2, 0.1 and 0.1 respectively. The η in Algorithm 1 is set as 2 10 4. The learning rate decay rule and the two β s of Adam are the same as Vaswani et al. (2017). Beam search is applied...beam sizes for the three tasks are 6, 4 and 4 respectively. |