Teacher Forcing Recovers Reward Functions for Text Generation
Authors: Yongchang Hao, Yuxin Liu, Lili Mou
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on dialogue generation and paraphrase generation. The empirical results suggest that our method leads to better performance compared with several baselines, including self-training and task-specific heuristic reward learning, on both tasks. This confirms the effectiveness and generality of our framework. |
| Researcher Affiliation | Academia | Dept. Computing Science, Alberta Machine Intelligence Institute (Amii) University of Alberta, Canada Canada CIFAR AI Chair, Amii {yongcha1,yliu17}@ualberta.ca, doublepower.mou@gmail.com |
| Pseudocode | Yes | Algorithm 1 summarizes our approach. |
| Open Source Code | Yes | Our code is publicly available at https://github.com/MANGA-UOFA/LMReward |
| Open Datasets | Yes | We adopt two widely used datasets, Daily Dialog [30] and Open Subtitles [57], for the dialogue experiment. The Quora dataset is originally designed for paraphrase classification, containing both paraphrase and non-paraphrase pairs. The paraphrase pairs naturally form a parallel dataset for the generation purpose; following the common practice [34], we split it into 124K/4K/20K samples for training/validation/test. |
| Dataset Splits | Yes | Daily Dialog... containing 60K/6.5K/7K samples for training/validation/test in Daily Dialog and 1M non-parallel samples in Open Subtitles. The Quora dataset... we split it into 124K/4K/20K samples for training/validation/test. |
| Hardware Specification | No | The paper states, 'First, the scale of the experiments in this paper is restricted by computational resources,' but does not provide any specific details about the hardware used (e.g., GPU models, CPU types). |
| Software Dependencies | No | The paper mentions using 'T5-Base model [42]' and 'NLTK library [33]' but does not provide version numbers for these or any other software dependencies needed to replicate the experiment. |
| Experiment Setup | No | The paper states, 'Appendix B provides implementation details and hyperparameters of our approach.' However, Appendix B is not included in the provided text, so specific experimental setup details are not available in the main body. |