Time-Reversal Provides Unsupervised Feedback to LLMs

Authors: Yerram Varun, Rahul Madhavan, Sravanti Addepalli, Arun Suggala, Karthikeyan Shanmugam, Prateek Jain

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show empirically (and theoretically in a stylized setting) that time-reversed models can indeed complement forward model predictions when used to score the query given response for re-ranking multiple forward generations. We obtain up to 5% improvement on the widely used Alpaca Eval Leaderboard over the competent baseline of best-of-N re-ranking using self log-perplexity scores. We further show that TRLM scoring outperforms conventional forward scoring of response given query, resulting in significant gains in applications such as citation generation and passage retrieval.
Researcher Affiliation Collaboration Varun Yerram Google Deep Mind Rahul Madhavan Indian Institute of Science Sravanti Addepalli Google Deep Mind Arun Suggala Google Deep Mind Karthikeyan Shanmugam Google Deep Mind Prateek Jain Google Deep Mind
Pseudocode Yes Algorithm 1 TRLM-Ba.Pretrain, Algorithm 2 TRLM-Ba.Score, Algorithm 3 TRLM-Fo.Score, Algorithm 4 TRLM-Fo Ba.Pretrain, Algorithm 5 TRLM-Ba.Generate, Algorithm 6 TRLM-Fo.Generate
Open Source Code No We do not release any model or datasets.
Open Datasets Yes The pre-training setup for all TRLM models is identical to that of PALM2-Otter models described by Anil et al. [2023b], except for the token orders specified by our TRLM.pretrain methods for TRLM-Fo , TRLM-Ba and TRLM-Fo Ba respectively. We fine-tune them on the FLa N dataset [Longpre et al., 2023] using the TRLM-xx.pretrain function. ... H Licenses and Copyrights Across Assets ... 7. CNN Daily Mail Citation: [Zhong et al., 2020] Asset Link: [link] License: Apache 2.0 license ... 8. MS-Marco Citation: [Bajaj et al., 2016] Asset Link: [link] License: Microsoft Terms and Conditions ... 9. NF-Corpus Citation: [Boteva et al., 2016b] Asset Link: [link] License: Terms of Use
Dataset Splits No The paper uses established benchmarks and datasets, but does not explicitly state the training/validation/test splits with specific percentages or counts for the experiments conducted in the paper. It refers to existing datasets and their use for evaluation (e.g., test split) but does not define its own splits for training/validation.
Hardware Specification Yes To pre-train TRLM models we use two TPUv5e pods[Cloud] for two weeks in the setup described by Anil et al. [2023b]. Further details on pre-training are provided in Appendix B. We run fine-tuning on FLAN-dataset using a TPUv5e pod [Cloud] for 1 day.
Software Dependencies No The paper mentions using specific datasets (e.g., FLa N dataset) and models (e.g., PALM2-Otter, Gemini-Pro-1.0, Mixtral), but it does not specify software dependencies like programming language versions or library versions (e.g., PyTorch, TensorFlow versions).
Experiment Setup Yes We generate 16 responses using a temperature τ = 0.8 to ensure diversity of answers. We then rerank the responses using different variants of TRLM from the PALM2-Otter family of models (TRLM training details in the supplement). We further consider two baselines, Self scoring and Forward Baselines, as described in Table 1. Scoring prompts and Conditioning prompts used with various TRLM variants for this task are described in the Table 7 of Appendix C.1.