reproducibilityindex.ai

Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models

Authors: Tanmay Gautam, Youngsuk Park, Hao Zhou, Parameswaran Raman, Wooseok Ha

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Evaluated across a range of both masked and autoregressive LMs (up to 7B parameters) on benchmark downstream tasks, Me ZO-SVRG outperforms Me ZO with up to 20% increase in test accuracies in both fulland partial-parameter fine-tuning settings.
Researcher Affiliation	Collaboration	1University of California, Berkeley, USA 2Amazon AI Research & Education, Santa Clara, USA 3Amazon AI Labs, Santa Clara, USA.
Pseudocode	Yes	The method is summarized in Algorithm 1.
Open Source Code	Yes	The code for the experiments is available at https: //github.com/amazon-science/mezo_svrg.
Open Datasets	Yes	We fine-tune on tasks from the NLP GLUE and Super GLUE benchmarks: Multi-Genre Natural Language Inference Corpus (MNLI), Stanford Question Answering Dataset (QNLI), Stanford Sentiment Treebank (SST-2), Corpus of Linguistic Acceptability (Co LA), and Bool Q (Williams et al., 2018; Wang et al., 2018; Socher et al., 2013; Warstadt et al., 2018; Wang et al., 2019).
Dataset Splits	Yes	Similar to Malladi et al. (2023), for each task, our experiments are conducted in a many-shot fine-tuning setting: 512 training examples, 256 validation examples and 256 test samples are randomly sampled from the dataset.
Hardware Specification	Yes	All experiments are run on a single GPU; specifically, we consider Nvidia A100 40GB or H100 80GB GPUs.
Software Dependencies	No	The paper mentions using "Huggingface datasets library" and "Huggingface transformers package" but does not specify exact version numbers for any software dependencies.
Experiment Setup	Yes	Setup. We evaluate on both full (FP32) and half (BF16) precision. We detail the experiment results for the BF16 setting in Appendix J.We mainly consider a prompt-free fine-tuning setting (more challenging loss landscape) but include prompted results for Ro BERTa-large (Liu et al., 2019) in Appendix G. ...Further details of the experiment setup and implementation are provided in Appendices D and E.