A Critical Evaluation of AI Feedback for Aligning Large Language Models

Authors: Archit Sharma, Sedrick Scott Keh, Eric Mitchell, Chelsea Finn, Kushal Arora, Thomas Kollar

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, we find that two conditions are necessary for LAIF to significantly outperform SFT: (a) a sufficiently strong pre-trained base model and, (b) a capability mismatch between the teacher used for the SFT data collection and the critic used for collecting AI feedback.
Researcher Affiliation Collaboration Archit Sharma Sedrick Keh Eric Mitchell Chelsea Finn Kushal Arora Thomas Kollar Stanford University Toyota Research Institute
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Code is available at: https://github.com/architsharma97/dpo-rlaif.
Open Datasets Yes To this end, we fix the dataset of prompts to be single-turn instructions derived from Share GPT [Chiang et al., 2023].
Dataset Splits Yes Therefore, we use 10% of the available prompts for the SFT stage and the rest of them to generate the AIF dataset.
Hardware Specification Yes Training was done on A100 80GB instances and took around 1 hour per epoch for a 7B model when trained on 100% of the training examples.
Software Dependencies No The paper mentions software like "Adam optimizer" but does not specify version numbers for any software dependencies.
Experiment Setup Yes For SFT runs, we train the models on 9 epochs and evaluate every 3 epochs. From here, we select the best checkpoint to report. We use a batch size of 8 and conduct a hyperparameter sweep for learning rate across {1e 7, 5e 7, 1e 6}.