Text Generation by Learning from Demonstrations
Authors: Richard Yuanzhe Pang, He He
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Results on news summarization, question generation, and machine translation show that GOLD leads to better model performance than MLE and RL fine-tuning by both task metrics and human-rated quality. |
| Researcher Affiliation | Academia | Richard Yuanzhe Pang 1 He He 1,2 yzpang@nyu.edu hehe@cs.nyu.edu 1 Courant Institute of Mathematical Sciences, New York University, New York, NY 10011, USA 2 Center for Data Science, New York University, New York, NY 10011, USA |
| Pseudocode | Yes | Algorithm 1: GOLD |
| Open Source Code | Yes | The code is available.2. Code: https://github.com/yzpang/gold-off-policy-text-gen-iclr21 |
| Open Datasets | Yes | We chose four text generation tasks: (1) question generation (NQG; Zhou et al., 2017):...; (2) summarization (CNN/DM; Hermann et al., 2015); (3) extreme summarization (XSum; Narayan et al., 2018):...; (4) machine translation (IWSLT14 De-En; Cettolo et al., 2014). |
| Dataset Splits | Yes | The train/dev/test split for NQG is 86229/8913/8919; the split for CNN/DM is 287227/13368/11490; the split for XSum is 204045/11332/11334; the split for IWSLT14 De-En is 160239/7283/6750. |
| Hardware Specification | Yes | We train using a single Nvidia GTX 1080 Ti (memory: 12 GB) GPU. For transformer models, we use Nvidia P40 GPUs (memory: 24 GB each). |
| Software Dependencies | No | The paper mentions 'fairseq' and refers to an implementation based on 'Cho et al. (2019)' but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | We use a learning rate of 5e-4. For NQG, we use a batch size of 32; for CNN/DM we use a batch size of 16. For transformer models, we use a learning rate of 2e-5 for NQG, CNN/DM, and XSum; 3e-4 for IWSLT14 De-En. For NQG, we use 512 tokens as batch size (for each of the four GPUs); for CNN/DM and XSum, we use 1024 tokens as batch size (for each of the four GPUs); for IWSLT14 De-En, we use 4096 tokens as batch size. |