reproducibilityindex.ai

Star-Agents: Automatic Data Optimization with LLM Agents for Instruction Tuning

Authors: Hang Zhou, Yehui Tang, Haochen Qin, Yujie Yang, Renren Jin, Deyi Xiong, Kai Han, Yunhe Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical studies, including instruction tuning experiments with models such as Pythia and LLa MA, demonstrate the effectiveness of the proposed framework.
Researcher Affiliation	Collaboration	Hang Zhou1,2, Yehui Tang2, Haochen Qin2, Yujie Yang2, Renren Jin1, Deyi Xiong1 , Kai Han2 , Yunhe Wang2 1College of Intelligence and Computing, Tianjin University, Tianjin, China. 2Huawei Noah s Ark Lab.
Pseudocode	No	The paper includes a diagram (Figure 1) but does not provide any pseudocode or algorithm blocks.
Open Source Code	No	Codes will be released soon
Open Datasets	Yes	In alignment with the Wizard LM [44], we adopted the Supervised Fine-Tuning (SFT) dataset, designated as the Evol-Instruct dataset, which consists of 70,000 instruction-response pairs. [...] For further enriching our comparative analysis, we employed the Alpaca dataset [32], comprising 52,000 instruction-following samples.
Dataset Splits	No	The paper mentions using the Evol-Instruct and Alpaca datasets for fine-tuning but does not explicitly provide details on how these datasets were split into training, validation, and test sets for their experiments, or if specific predefined splits were used for validation.
Hardware Specification	No	The paper does not specify the exact hardware used for running the experiments (e.g., specific GPU models, CPU types, or memory sizes). Appendix A.5 discusses computational load of LLM agents, not the experimental hardware.
Software Dependencies	No	The paper mentions 'Adam optimizer' and 'Fast-Chat [54]' and 'GPT-4' but does not provide specific version numbers for these or any other software libraries or dependencies used in the experiments.
Experiment Setup	Yes	We ﬁne-tuned our models (Pythia-1B and Llama-2-7B) over three epochs using the Adam optimizer, with an initial learning rate of 2 × 10−5, a maximum token count of 2048, and a batch size of 64.