Parameter Efficient Quasi-Orthogonal Fine-Tuning via Givens Rotation
Authors: Xinyu Ma, Xu Chu, Zhibang Yang, Yang Lin, Xin Gao, Junfeng Zhao
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on various tasks and pretrained models validate the effectiveness of our methods. In this section, we conduct extensive experiments to evaluate the effectiveness of our methods. |
| Researcher Affiliation | Academia | 1School of Computer Science, Peking University, Beijing, China 2Key Laboratory of High Confidence Software Technologies, Ministry of Education, Beijing, China 3Center on Frontiers of Computing Studies, Peking University, Beijing, China. |
| Pseudocode | Yes | Algorithm 1 The fine-tuning and testing procedure of a pre-trained model with (q)GOFT. |
| Open Source Code | Yes | We implement GOFT and q GOFT for fine-tuning De BERTa V3-base (He et al., 2021) and LLa MA2-7B (Touvron et al., 2023), we also integrate our methods into the PEFT library (Mangrulkar et al., 2022) 1. 1https://github.com/ArthurLeoM/peft-givens |
| Open Datasets | Yes | Various downstream NLP tasks are applied to fine-tune the PLMs for conducting comparisons between baselines, including natural language understanding (Wang et al., 2018a, GLUE), instruction following (Hendrycks et al., 2021, MMLU) (Chiang et al., 2023, Vicuna-Eval), and question answering (Rajpurkar et al., 2016, SQu AD). We also validate the effectiveness of our method on visual tasks (Zhai et al., 2019, VTAB-1K) by fine-tuning VFMs like Vi T-B/16 (Dosovitskiy et al., 2021). |
| Dataset Splits | Yes | We present the detailed dataset statistics of GLUE benchmark (Wang et al., 2018a) in Table 6. (Table 6 shows #Dev column). SQu ADv1.1 consists of 87,599 training samples and 10,570 validation samples. |
| Hardware Specification | Yes | The experiments are conducted on a single NVIDIA-A100-80GB GPU or distributedly on a maximum of 4 NVIDIA-RTX3090-24GB GPUs. |
| Software Dependencies | No | The paper mentions software like Py Torch, Hugging Face transformers, PEFT library, and LLa MA-Factory, but does not specify their version numbers. |
| Experiment Setup | Yes | The specific tuned hyperparameters used in our experiments are presented in Table 5. |