Text Generation by Learning from Demonstrations

Authors: Richard Yuanzhe Pang, He He

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Results on news summarization, question generation, and machine translation show that GOLD leads to better model performance than MLE and RL fine-tuning by both task metrics and human-rated quality.
Researcher Affiliation Academia Richard Yuanzhe Pang 1 He He 1,2 yzpang@nyu.edu hehe@cs.nyu.edu 1 Courant Institute of Mathematical Sciences, New York University, New York, NY 10011, USA 2 Center for Data Science, New York University, New York, NY 10011, USA
Pseudocode Yes Algorithm 1: GOLD
Open Source Code Yes The code is available.2. Code: https://github.com/yzpang/gold-off-policy-text-gen-iclr21
Open Datasets Yes We chose four text generation tasks: (1) question generation (NQG; Zhou et al., 2017):...; (2) summarization (CNN/DM; Hermann et al., 2015); (3) extreme summarization (XSum; Narayan et al., 2018):...; (4) machine translation (IWSLT14 De-En; Cettolo et al., 2014).
Dataset Splits Yes The train/dev/test split for NQG is 86229/8913/8919; the split for CNN/DM is 287227/13368/11490; the split for XSum is 204045/11332/11334; the split for IWSLT14 De-En is 160239/7283/6750.
Hardware Specification Yes We train using a single Nvidia GTX 1080 Ti (memory: 12 GB) GPU. For transformer models, we use Nvidia P40 GPUs (memory: 24 GB each).
Software Dependencies No The paper mentions 'fairseq' and refers to an implementation based on 'Cho et al. (2019)' but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes We use a learning rate of 5e-4. For NQG, we use a batch size of 32; for CNN/DM we use a batch size of 16. For transformer models, we use a learning rate of 2e-5 for NQG, CNN/DM, and XSum; 3e-4 for IWSLT14 De-En. For NQG, we use 512 tokens as batch size (for each of the four GPUs); for CNN/DM and XSum, we use 1024 tokens as batch size (for each of the four GPUs); for IWSLT14 De-En, we use 4096 tokens as batch size.