reproducibilityindex.ai

NEFTune: Noisy Embeddings Improve Instruction Finetuning

Authors: Neel Jain, Ping-yeh Chiang, Yuxin Wen, John Kirchenbauer, Hong-Min Chu, Gowthami Somepalli, Brian R. Bartoldson, Bhavya Kailkhura, Avi Schwarzschild, Aniruddha Saha, Micah Goldblum, Jonas Geiping, Tom Goldstein

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Standard finetuning of LLa MA-2-7B using Alpaca achieves 29.79% on Alpaca Eval, which rises to 64.69% using noisy embeddings.
Researcher Affiliation	Collaboration	1 University of Maryland, 2 Lawrence Livermore National Laboratory, 3 New York University
Pseudocode	Yes	Algorithm 1 NEFTune: Noisy Embedding Instruction Finetuning
Open Source Code	Yes	Code is available on Github: https://github.com/neelsjain/NEFTune.
Open Datasets	Yes	Alpaca (Taori et al., 2023) was constructed using the Self-Instruct method of Wang et al. (2022), and the Text-Davinci-003 model (Ouyang et al., 2022). [...] Evol-Instruct (Xu et al., 2023) contains 70k single-turn instructions [...]. Open-Platypus (Lee et al., 2023) is a curated dataset amalgamated from 11 open-source datasets [...]. Share GPT (Chiang et al., 2023) is a dataset of 70K voluntarily-shared Chat GPT conversations (Share GPT, 2023).
Dataset Splits	No	The paper describes hyperparameter tuning through a 'coarse sweep on LLa MA-1 (7B) trained on the Alpaca dataset' but does not specify explicit training, validation, and test dataset splits with percentages or counts.
Hardware Specification	Yes	We finetune the 7B parameter models on four A5000s and 13B parameters on eight A5000s using bfloat16 precision.
Software Dependencies	No	The paper mentions 'bfloat16 precision' and 'open source software' but does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	We use learning rate of 5e-5 and the Adam optimizer for all 7B models... We train all models for 3 epochs on all datasets setting the same seed for each run with an effective batch size of 128 (4 cards, batch size 4, 8 gradient accumulation steps).