Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models

Authors: Tanmay Gautam, Youngsuk Park, Hao Zhou, Parameswaran Raman, Wooseok Ha

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Evaluated across a range of both masked and autoregressive LMs (up to 7B parameters) on benchmark downstream tasks, Me ZO-SVRG outperforms Me ZO with up to 20% increase in test accuracies in both fulland partial-parameter fine-tuning settings.
Researcher Affiliation Collaboration 1University of California, Berkeley, USA 2Amazon AI Research & Education, Santa Clara, USA 3Amazon AI Labs, Santa Clara, USA.
Pseudocode Yes The method is summarized in Algorithm 1.
Open Source Code Yes The code for the experiments is available at https: //github.com/amazon-science/mezo_svrg.
Open Datasets Yes We fine-tune on tasks from the NLP GLUE and Super GLUE benchmarks: Multi-Genre Natural Language Inference Corpus (MNLI), Stanford Question Answering Dataset (QNLI), Stanford Sentiment Treebank (SST-2), Corpus of Linguistic Acceptability (Co LA), and Bool Q (Williams et al., 2018; Wang et al., 2018; Socher et al., 2013; Warstadt et al., 2018; Wang et al., 2019).
Dataset Splits Yes Similar to Malladi et al. (2023), for each task, our experiments are conducted in a many-shot fine-tuning setting: 512 training examples, 256 validation examples and 256 test samples are randomly sampled from the dataset.
Hardware Specification Yes All experiments are run on a single GPU; specifically, we consider Nvidia A100 40GB or H100 80GB GPUs.
Software Dependencies No The paper mentions using "Huggingface datasets library" and "Huggingface transformers package" but does not specify exact version numbers for any software dependencies.
Experiment Setup Yes Setup. We evaluate on both full (FP32) and half (BF16) precision. We detail the experiment results for the BF16 setting in Appendix J.We mainly consider a prompt-free fine-tuning setting (more challenging loss landscape) but include prompted results for Ro BERTa-large (Liu et al., 2019) in Appendix G. ...Further details of the experiment setup and implementation are provided in Appendices D and E.