reproducibilityindex.ai

A Critical Evaluation of AI Feedback for Aligning Large Language Models

Authors: Archit Sharma, Sedrick Scott Keh, Eric Mitchell, Chelsea Finn, Kushal Arora, Thomas Kollar

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our experiments, we find that two conditions are necessary for LAIF to significantly outperform SFT: (a) a sufficiently strong pre-trained base model and, (b) a capability mismatch between the teacher used for the SFT data collection and the critic used for collecting AI feedback.
Researcher Affiliation	Collaboration	Archit Sharma Sedrick Keh Eric Mitchell Chelsea Finn Kushal Arora Thomas Kollar Stanford University Toyota Research Institute
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at: https://github.com/architsharma97/dpo-rlaif.
Open Datasets	Yes	To this end, we fix the dataset of prompts to be single-turn instructions derived from Share GPT [Chiang et al., 2023].
Dataset Splits	Yes	Therefore, we use 10% of the available prompts for the SFT stage and the rest of them to generate the AIF dataset.
Hardware Specification	Yes	Training was done on A100 80GB instances and took around 1 hour per epoch for a 7B model when trained on 100% of the training examples.
Software Dependencies	No	The paper mentions software like "Adam optimizer" but does not specify version numbers for any software dependencies.
Experiment Setup	Yes	For SFT runs, we train the models on 9 epochs and evaluate every 3 epochs. From here, we select the best checkpoint to report. We use a batch size of 8 and conduct a hyperparameter sweep for learning rate across {1e 7, 5e 7, 1e 6}.