reproducibilityindex.ai

Teacher Forcing Recovers Reward Functions for Text Generation

Authors: Yongchang Hao, Yuxin Liu, Lili Mou

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments on dialogue generation and paraphrase generation. The empirical results suggest that our method leads to better performance compared with several baselines, including self-training and task-speciﬁc heuristic reward learning, on both tasks. This conﬁrms the effectiveness and generality of our framework.
Researcher Affiliation	Academia	Dept. Computing Science, Alberta Machine Intelligence Institute (Amii) University of Alberta, Canada Canada CIFAR AI Chair, Amii {yongcha1,yliu17}@ualberta.ca, doublepower.mou@gmail.com
Pseudocode	Yes	Algorithm 1 summarizes our approach.
Open Source Code	Yes	Our code is publicly available at https://github.com/MANGA-UOFA/LMReward
Open Datasets	Yes	We adopt two widely used datasets, Daily Dialog [30] and Open Subtitles [57], for the dialogue experiment. The Quora dataset is originally designed for paraphrase classiﬁcation, containing both paraphrase and non-paraphrase pairs. The paraphrase pairs naturally form a parallel dataset for the generation purpose; following the common practice [34], we split it into 124K/4K/20K samples for training/validation/test.
Dataset Splits	Yes	Daily Dialog... containing 60K/6.5K/7K samples for training/validation/test in Daily Dialog and 1M non-parallel samples in Open Subtitles. The Quora dataset... we split it into 124K/4K/20K samples for training/validation/test.
Hardware Specification	No	The paper states, 'First, the scale of the experiments in this paper is restricted by computational resources,' but does not provide any specific details about the hardware used (e.g., GPU models, CPU types).
Software Dependencies	No	The paper mentions using 'T5-Base model [42]' and 'NLTK library [33]' but does not provide version numbers for these or any other software dependencies needed to replicate the experiment.
Experiment Setup	No	The paper states, 'Appendix B provides implementation details and hyperparameters of our approach.' However, Appendix B is not included in the provided text, so specific experimental setup details are not available in the main body.