Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
AlpaGasus: Training a Better Alpaca with Fewer Data
Authors: Lichang Chen, Shiyang Li, Jun Yan, Hai Wang, Kalpa Gunaratna, Vikas Yadav, Zheng Tang, Vijay Srinivasan, Tianyi Zhou, Heng Huang, Hongxia Jin
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | ALPAGASUS significantly outperforms the original ALPACA as evaluated by GPT-4 on multiple test sets and the controlled human evaluation. Its 13B variant matches > 90% performance of its teacher LLM (i.e., Text-Davinci-003 generating the 52k data) on test tasks. |
| Researcher Affiliation | Collaboration | University of Maryland, College Park Samsung Research America University of Southern California |
| Pseudocode | No | The paper describes a data rating and filtering process but does not present it as pseudocode or a labeled algorithm block. |
| Open Source Code | Yes | Our project page is available at: https://lichang-chen.github.io/Alpa Gasus/ |
| Open Datasets | Yes | ALPACA (Taori et al., 2023) is an open-sourced model developed by Stanford University through IFT of LLa MA on a training dataset of 52,002 (instruction, input, response) samples with the responses generated by Text Davinci-003 (teacher). |
| Dataset Splits | No | The paper mentions training data and test sets, but does not provide specific details for a validation dataset split. |
| Hardware Specification | Yes | using 4 NVIDIA A100 (80GB) GPUs and following the original ALPACA setting and hyperparameters. |
| Software Dependencies | No | The paper mentions various LLM models used (e.g., LLaMA, GPT-4, Chat GPT, Claude) but does not provide specific software dependencies or library versions (e.g., Python, PyTorch versions) used for implementation. |
| Experiment Setup | Yes | We apply IFT for the same number of epochs as ALPACA(7B) but on fewer data, using 4 NVIDIA A100 (80GB) GPUs and following the original ALPACA setting and hyperparameters. |