Time-Reversal Provides Unsupervised Feedback to LLMs
Authors: Yerram Varun, Rahul Madhavan, Sravanti Addepalli, Arun Suggala, Karthikeyan Shanmugam, Prateek Jain
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show empirically (and theoretically in a stylized setting) that time-reversed models can indeed complement forward model predictions when used to score the query given response for re-ranking multiple forward generations. We obtain up to 5% improvement on the widely used Alpaca Eval Leaderboard over the competent baseline of best-of-N re-ranking using self log-perplexity scores. We further show that TRLM scoring outperforms conventional forward scoring of response given query, resulting in significant gains in applications such as citation generation and passage retrieval. |
| Researcher Affiliation | Collaboration | Varun Yerram Google Deep Mind Rahul Madhavan Indian Institute of Science Sravanti Addepalli Google Deep Mind Arun Suggala Google Deep Mind Karthikeyan Shanmugam Google Deep Mind Prateek Jain Google Deep Mind |
| Pseudocode | Yes | Algorithm 1 TRLM-Ba.Pretrain, Algorithm 2 TRLM-Ba.Score, Algorithm 3 TRLM-Fo.Score, Algorithm 4 TRLM-Fo Ba.Pretrain, Algorithm 5 TRLM-Ba.Generate, Algorithm 6 TRLM-Fo.Generate |
| Open Source Code | No | We do not release any model or datasets. |
| Open Datasets | Yes | The pre-training setup for all TRLM models is identical to that of PALM2-Otter models described by Anil et al. [2023b], except for the token orders specified by our TRLM.pretrain methods for TRLM-Fo , TRLM-Ba and TRLM-Fo Ba respectively. We fine-tune them on the FLa N dataset [Longpre et al., 2023] using the TRLM-xx.pretrain function. ... H Licenses and Copyrights Across Assets ... 7. CNN Daily Mail Citation: [Zhong et al., 2020] Asset Link: [link] License: Apache 2.0 license ... 8. MS-Marco Citation: [Bajaj et al., 2016] Asset Link: [link] License: Microsoft Terms and Conditions ... 9. NF-Corpus Citation: [Boteva et al., 2016b] Asset Link: [link] License: Terms of Use |
| Dataset Splits | No | The paper uses established benchmarks and datasets, but does not explicitly state the training/validation/test splits with specific percentages or counts for the experiments conducted in the paper. It refers to existing datasets and their use for evaluation (e.g., test split) but does not define its own splits for training/validation. |
| Hardware Specification | Yes | To pre-train TRLM models we use two TPUv5e pods[Cloud] for two weeks in the setup described by Anil et al. [2023b]. Further details on pre-training are provided in Appendix B. We run fine-tuning on FLAN-dataset using a TPUv5e pod [Cloud] for 1 day. |
| Software Dependencies | No | The paper mentions using specific datasets (e.g., FLa N dataset) and models (e.g., PALM2-Otter, Gemini-Pro-1.0, Mixtral), but it does not specify software dependencies like programming language versions or library versions (e.g., PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | We generate 16 responses using a temperature τ = 0.8 to ensure diversity of answers. We then rerank the responses using different variants of TRLM from the PALM2-Otter family of models (TRLM training details in the supplement). We further consider two baselines, Self scoring and Forward Baselines, as described in Table 1. Scoring prompts and Conditioning prompts used with various TRLM variants for this task are described in the Table 7 of Appendix C.1. |