reproducibilityindex.ai

AlpaGasus: Training a Better Alpaca with Fewer Data

Authors: Lichang Chen, Shiyang Li, Jun Yan, Hai Wang, Kalpa Gunaratna, Vikas Yadav, Zheng Tang, Vijay Srinivasan, Tianyi Zhou, Heng Huang, Hongxia Jin

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	ALPAGASUS significantly outperforms the original ALPACA as evaluated by GPT-4 on multiple test sets and the controlled human evaluation. Its 13B variant matches > 90% performance of its teacher LLM (i.e., Text-Davinci-003 generating the 52k data) on test tasks.
Researcher Affiliation	Collaboration	University of Maryland, College Park Samsung Research America University of Southern California
Pseudocode	No	The paper describes a data rating and filtering process but does not present it as pseudocode or a labeled algorithm block.
Open Source Code	Yes	Our project page is available at: https://lichang-chen.github.io/Alpa Gasus/
Open Datasets	Yes	ALPACA (Taori et al., 2023) is an open-sourced model developed by Stanford University through IFT of LLa MA on a training dataset of 52,002 (instruction, input, response) samples with the responses generated by Text Davinci-003 (teacher).
Dataset Splits	No	The paper mentions training data and test sets, but does not provide specific details for a validation dataset split.
Hardware Specification	Yes	using 4 NVIDIA A100 (80GB) GPUs and following the original ALPACA setting and hyperparameters.
Software Dependencies	No	The paper mentions various LLM models used (e.g., LLaMA, GPT-4, Chat GPT, Claude) but does not provide specific software dependencies or library versions (e.g., Python, PyTorch versions) used for implementation.
Experiment Setup	Yes	We apply IFT for the same number of epochs as ALPACA(7B) but on fewer data, using 4 NVIDIA A100 (80GB) GPUs and following the original ALPACA setting and hyperparameters.