VeRA: Vector-based Random Matrix Adaptation

Authors: Dawid Jan Kopiczko, Tijmen Blankevoort, Yuki M Asano

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we conduct a series of experiments to evaluate our finetuning method. We start by comparing our approach to Lo RA and other baselines on the GLUE and E2E benchmarks. Following this, we turn our attention to instruction-tuning of Llama models, and image classification with Vision Transformers. Next, we select one task and vary the rank for both methods, Lo RA and Ve RA, to examine how performance scales with the number of trainable parameters. Lastly, an ablation study sheds light on the importance of each component in our method, including the influence of different initializations.
Researcher Affiliation Collaboration Dawid J. Kopiczko QUVA Lab University of Amsterdam Tijmen Blankevoort Qualcomm AI Research Yuki M. Asano QUVA Lab University of Amsterdam
Pseudocode No The paper describes its method formulation using mathematical equations and diagrams (Figure 1), but does not provide any structured pseudocode or algorithm blocks.
Open Source Code No The paper mentions a website: '1Website: https://dkopi.github.io/vera/'. This is a project website (GitHub Pages) and not a direct link to a source-code repository for the methodology described in the paper. There is no explicit statement of code release or specific repository link provided for the code itself.
Open Datasets Yes We evaluate our approach on the General Language Understanding Evaluation (GLUE) benchmark (Wang et al., 2019), employing the Ro BERTabase and Ro BERTalarge models (Liu et al., 2019). For the E2E benchmark (Novikova et al., 2017), we follow the experimental setup from Hu et al. (2022) and finetune the GPT-2 (Radford et al., 2019) Medium and Large models. (...) We employ the Alpaca dataset (Taori et al., 2023), specifically its cleaned version4. (...) To evaluate the method on the image classification task, we adapt Vision Transformer (Vi T) (Dosovitskiy et al., 2021), Base and Large variants, on datasets CIFAR100 (Krizhevsky, 2009), Food101 (Bossard et al., 2014), Flowers102 (Nilsback & Zisserman, 2008), and RESISC45 (Cheng et al., 2017).
Dataset Splits Yes We perform 5 runs with different random seeds, recording the best epoch s outcome for each run, and report the median of these results. (...) For each dataset we train on a subset of 10 samples per class, and evaluate on the full test set (CIFAR100, Food101, Flowers102) or on all the remaining samples (RESISC45).
Hardware Specification No The paper mentions using 'a single GPU' and 'National Supercomputer Snellius and Distributed ASCI Supercomputer 6 (Bal et al., 2016)', but does not provide specific models or detailed specifications for the GPUs, CPUs, or other hardware components used for the experiments. 'Single GPU' is not specific enough.
Software Dependencies No The paper mentions software like 'Py Torch (Paszke et al., 2019)' and 'Hugging Face PEFT (Mangrulkar et al., 2022)' but does not provide specific version numbers for these or any other ancillary software components.
Experiment Setup Yes We determine the learning rates and the number of training epochs through hyperparameter tuning; for detailed settings, refer to the Table 8 in Appendix A. (...) Table 8: Hyperparameter configurations for different model sizes on GLUE benchmark. Optimizer, Warmup Ratio, and LR Schedule are taken from Hu et al. (2022) (...) Table 9: Hyperparameter configurations for instruction-tuning. (...) Table 10: Hyperparameter configurations for Ve RA on the E2E benchmark (...) Table 11: Hyperparameter configurations for Ve RA and Lo RA for finetuning Vi T on the image classification datasets.