reproducibilityindex.ai

SelectIT: Selective Instruction Tuning for LLMs via Uncertainty-Aware Self-Reflection

Authors: Liangxin Liu, Xuebo Liu, Derek Wong, Dongfang Li, Ziyi Wang, Baotian Hu, Min Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical results demonstrate that IT using Selective Alpaca leads to substantial model ability enhancement.
Researcher Affiliation	Academia	Liangxin Liu1 Xuebo Liu1 Derek F. Wong2 Dongfang Li1 Ziyi Wang1 Baotian Hu1 Min Zhang1 1Institute of Computing and Intelligence, Harbin Institute of Technology, Shenzhen, China 2NLP2CT Lab, Department of Computer and Information Science, University of Macau
Pseudocode	No	The paper describes the proposed methods using text and mathematical equations, but it does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps in a code-like format.
Open Source Code	Yes	Data, code, and scripts are freely available at https://github.com/Blue-Raincoat/Select IT.
Open Datasets	Yes	We apply Select IT to the widely-used Alpaca-GPT4 (Peng et al., 2023a). Specifically, we use the most popular LLa MA-2 (7B, 13B, 70B) as our foundation models... We further validate the robustness of Select IT by deploying it on two additional, widely-utilized datasets: Wizard LM (Xu et al., 2023) and Orca-GPT4 (Subhabrata & Arindam, 2023).
Dataset Splits	No	The paper mentions fine-tuning for a certain number of epochs and evaluating on various benchmarks, but it does not explicitly specify the division of the dataset into training, validation, and test splits with percentages or counts.
Hardware Specification	Yes	Using Select IT, we employ 4 A800 80G GPUs to select high-quality IT data, calculating the total cost based on Google Cloud s rate of $1.15/h per single GPU.
Software Dependencies	No	The paper describes various parameters and optimizers used (e.g., 'Adam with β1 = 0.9, β2 = 0.999', 'cosine learning rate scheduler'), but it does not provide specific version numbers for core software dependencies such as programming languages or deep learning frameworks.
Experiment Setup	Yes	We fine-tune it for 3 epochs, with a batch size of 128. We use Adam with β1 = 0.9, β2 = 0.999, and the cosine learning rate scheduler starts from 2e 5, and decays to 0. we opted for a 4096 input length because it can show the best performance of LLMs. We employ the beam = 4 for decoding. We set the temperature parameter to 0.8 and the top p sampling parameter to 0.9 to improve the originality of the output text while ensuring the accuracy and relevance of the content.