Teacher Forcing Recovers Reward Functions for Text Generation

Authors: Yongchang Hao, Yuxin Liu, Lili Mou

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on dialogue generation and paraphrase generation. The empirical results suggest that our method leads to better performance compared with several baselines, including self-training and task-specific heuristic reward learning, on both tasks. This confirms the effectiveness and generality of our framework.
Researcher Affiliation Academia Dept. Computing Science, Alberta Machine Intelligence Institute (Amii) University of Alberta, Canada Canada CIFAR AI Chair, Amii {yongcha1,yliu17}@ualberta.ca, doublepower.mou@gmail.com
Pseudocode Yes Algorithm 1 summarizes our approach.
Open Source Code Yes Our code is publicly available at https://github.com/MANGA-UOFA/LMReward
Open Datasets Yes We adopt two widely used datasets, Daily Dialog [30] and Open Subtitles [57], for the dialogue experiment. The Quora dataset is originally designed for paraphrase classification, containing both paraphrase and non-paraphrase pairs. The paraphrase pairs naturally form a parallel dataset for the generation purpose; following the common practice [34], we split it into 124K/4K/20K samples for training/validation/test.
Dataset Splits Yes Daily Dialog... containing 60K/6.5K/7K samples for training/validation/test in Daily Dialog and 1M non-parallel samples in Open Subtitles. The Quora dataset... we split it into 124K/4K/20K samples for training/validation/test.
Hardware Specification No The paper states, 'First, the scale of the experiments in this paper is restricted by computational resources,' but does not provide any specific details about the hardware used (e.g., GPU models, CPU types).
Software Dependencies No The paper mentions using 'T5-Base model [42]' and 'NLTK library [33]' but does not provide version numbers for these or any other software dependencies needed to replicate the experiment.
Experiment Setup No The paper states, 'Appendix B provides implementation details and hyperparameters of our approach.' However, Appendix B is not included in the provided text, so specific experimental setup details are not available in the main body.