AlpaGasus: Training a Better Alpaca with Fewer Data

Authors: Lichang Chen, Shiyang Li, Jun Yan, Hai Wang, Kalpa Gunaratna, Vikas Yadav, Zheng Tang, Vijay Srinivasan, Tianyi Zhou, Heng Huang, Hongxia Jin

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental ALPAGASUS significantly outperforms the original ALPACA as evaluated by GPT-4 on multiple test sets and the controlled human evaluation. Its 13B variant matches > 90% performance of its teacher LLM (i.e., Text-Davinci-003 generating the 52k data) on test tasks.
Researcher Affiliation Collaboration University of Maryland, College Park Samsung Research America University of Southern California
Pseudocode No The paper describes a data rating and filtering process but does not present it as pseudocode or a labeled algorithm block.
Open Source Code Yes Our project page is available at: https://lichang-chen.github.io/Alpa Gasus/
Open Datasets Yes ALPACA (Taori et al., 2023) is an open-sourced model developed by Stanford University through IFT of LLa MA on a training dataset of 52,002 (instruction, input, response) samples with the responses generated by Text Davinci-003 (teacher).
Dataset Splits No The paper mentions training data and test sets, but does not provide specific details for a validation dataset split.
Hardware Specification Yes using 4 NVIDIA A100 (80GB) GPUs and following the original ALPACA setting and hyperparameters.
Software Dependencies No The paper mentions various LLM models used (e.g., LLaMA, GPT-4, Chat GPT, Claude) but does not provide specific software dependencies or library versions (e.g., Python, PyTorch versions) used for implementation.
Experiment Setup Yes We apply IFT for the same number of epochs as ALPACA(7B) but on fewer data, using 4 NVIDIA A100 (80GB) GPUs and following the original ALPACA setting and hyperparameters.