Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Teacher Forcing Recovers Reward Functions for Text Generation
Authors: Yongchang Hao, Yuxin Liu, Lili Mou
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on dialogue generation and paraphrase generation. The empirical results suggest that our method leads to better performance compared with several baselines, including self-training and task-specific heuristic reward learning, on both tasks. This confirms the effectiveness and generality of our framework. |
| Researcher Affiliation | Academia | Dept. Computing Science, Alberta Machine Intelligence Institute (Amii) University of Alberta, Canada Canada CIFAR AI Chair, Amii EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 summarizes our approach. |
| Open Source Code | Yes | Our code is publicly available at https://github.com/MANGA-UOFA/LMReward |
| Open Datasets | Yes | We adopt two widely used datasets, Daily Dialog [30] and Open Subtitles [57], for the dialogue experiment. The Quora dataset is originally designed for paraphrase classification, containing both paraphrase and non-paraphrase pairs. The paraphrase pairs naturally form a parallel dataset for the generation purpose; following the common practice [34], we split it into 124K/4K/20K samples for training/validation/test. |
| Dataset Splits | Yes | Daily Dialog... containing 60K/6.5K/7K samples for training/validation/test in Daily Dialog and 1M non-parallel samples in Open Subtitles. The Quora dataset... we split it into 124K/4K/20K samples for training/validation/test. |
| Hardware Specification | No | The paper states, 'First, the scale of the experiments in this paper is restricted by computational resources,' but does not provide any specific details about the hardware used (e.g., GPU models, CPU types). |
| Software Dependencies | No | The paper mentions using 'T5-Base model [42]' and 'NLTK library [33]' but does not provide version numbers for these or any other software dependencies needed to replicate the experiment. |
| Experiment Setup | No | The paper states, 'Appendix B provides implementation details and hyperparameters of our approach.' However, Appendix B is not included in the provided text, so specific experimental setup details are not available in the main body. |