Recursive Introspection: Teaching Language Model Agents How to Self-Improve

Authors: Yuxiao Qu, Tianjun Zhang, Naman Garg, Aviral Kumar

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that RISE enables Llama2, Llama3, and Mistral models to improve themselves with more turns on reasoning tasks, outperforming several single-turn strategies given an equal amount of inference-time computation.
Researcher Affiliation Collaboration Yuxiao Qu1, Tianjun Zhang2, Naman Garg3, Aviral Kumar1 1Carnegie Mellon University, 2UC Berkeley, 3Multi On
Pseudocode Yes A complete algorithmic pseudocode for each approach is shown in Appendix D. ... Algorithm 1 Data Collection at Iteration T ... Algorithm 2 Inference at iteration T
Open Source Code Yes The code is publicly available at https://github.com/cmu-mind/RISE
Open Datasets Yes Specifically, on the GSM8K [12] dataset... We see similar trends on the MATH dataset [20]... The GSM8K dataset consists of 7,473 problems in the training portion and 1,319 problems in the testing portion. Similarly, the MATH dataset is divided into 7,500 problems for training and 1,000 problems for testing.
Dataset Splits No The GSM8K dataset consists of 7,473 problems in the training portion and 1,319 problems in the testing portion. Similarly, the MATH dataset is divided into 7,500 problems for training and 1,000 problems for testing. The training portions of both datasets are used to generate trajectories in each iteration of the RISE method, while the testing portions are held out for evaluating the performance of the models.
Hardware Specification Yes The hyperparameters used for finetuning are specified in Table 9. ... gpus 4x A40
Software Dependencies No For finetuning, we utilize the Fast Chat codebase, but we customize the loss function to be weighted by reward. The base models are directly loaded from Hugging Face: hrefhttps://huggingface.co/metallama/Llama-2-7b-hf Llama-2-7b-chat-hf and Mistral-7B-Instruct-v0.2. The hyperparameters used for finetuning are specified in Table 9.
Experiment Setup Yes The hyperparameters used for finetuning are specified in Table 9. Hyperparameter Values bf16 True epochs 2 per device train batch size 1 gpus 4x A40 gradient accumulation steps 16 learning rate 1e-5 weighted decay 0 warmup ratio 0.04 learning rate scheduler trype cosince tf32 True model max length 2048