Variance-reduced Zeroth-Order Methods for Fine-Tuning Language Models
Authors: Tanmay Gautam, Youngsuk Park, Hao Zhou, Parameswaran Raman, Wooseok Ha
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Evaluated across a range of both masked and autoregressive LMs (up to 7B parameters) on benchmark downstream tasks, Me ZO-SVRG outperforms Me ZO with up to 20% increase in test accuracies in both fulland partial-parameter fine-tuning settings. |
| Researcher Affiliation | Collaboration | 1University of California, Berkeley, USA 2Amazon AI Research & Education, Santa Clara, USA 3Amazon AI Labs, Santa Clara, USA. |
| Pseudocode | Yes | The method is summarized in Algorithm 1. |
| Open Source Code | Yes | The code for the experiments is available at https: //github.com/amazon-science/mezo_svrg. |
| Open Datasets | Yes | We fine-tune on tasks from the NLP GLUE and Super GLUE benchmarks: Multi-Genre Natural Language Inference Corpus (MNLI), Stanford Question Answering Dataset (QNLI), Stanford Sentiment Treebank (SST-2), Corpus of Linguistic Acceptability (Co LA), and Bool Q (Williams et al., 2018; Wang et al., 2018; Socher et al., 2013; Warstadt et al., 2018; Wang et al., 2019). |
| Dataset Splits | Yes | Similar to Malladi et al. (2023), for each task, our experiments are conducted in a many-shot fine-tuning setting: 512 training examples, 256 validation examples and 256 test samples are randomly sampled from the dataset. |
| Hardware Specification | Yes | All experiments are run on a single GPU; specifically, we consider Nvidia A100 40GB or H100 80GB GPUs. |
| Software Dependencies | No | The paper mentions using "Huggingface datasets library" and "Huggingface transformers package" but does not specify exact version numbers for any software dependencies. |
| Experiment Setup | Yes | Setup. We evaluate on both full (FP32) and half (BF16) precision. We detail the experiment results for the BF16 setting in Appendix J.We mainly consider a prompt-free fine-tuning setting (more challenging loss landscape) but include prompted results for Ro BERTa-large (Liu et al., 2019) in Appendix G. ...Further details of the experiment setup and implementation are provided in Appendices D and E. |