Structured Unrestricted-Rank Matrices for Parameter Efficient Finetuning
Authors: Arijit Sehanobish, Kumar Avinava Dubey, Krzysztof M Choromanski, Somnath Basu Roy Chowdhury, Deepali Jain, Vikas Sindhwani, Snigdha Chaturvedi
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose a general framework for parameter efficient fine-tuning using structured unrestricted-rank matrices (SURM), which can serve as a drop-in replacement for popular approaches such as Adapters and Lo RA. ... SURMs achieve 5-7% accuracy gains on various image classification tasks while replacing lowrank matrices in Lo RA. It also results in up to 12x reduction of the number of parameters in adapters (with virtually no loss in quality) on the GLUE benchmark. |
| Researcher Affiliation | Collaboration | 1Independent 2 Google Research 3Google Deep Mind 4Columbia University 5UNC Chapel Hill |
| Pseudocode | No | The paper describes its methods in text but does not include formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/arijitthegame/ structured-matrices-PEFT. |
| Open Datasets | Yes | We evaluate SURM on several vision datasets: CIFAR10, CIFAR100 [39], SUN397 [79], DTD [16] and STL10 [17]. We evaluate SURM in low data regime using VTAB-1k datasets [85] and the Vi T model. On Image Net [18] and i Nat2021 [72]... We extensively evaluate SURM models on the GLUE benchmark [74]... using Synapse multi-organ segmentation dataset [82]. |
| Dataset Splits | Yes | VTAB-1k datasets [85]... only 1000 training examples. We further evaluate the performance of SURM in a large-scale setting using the i Nat2021 dataset [72], which contains over 2.7 million training images, 100K validation images, and 500K test images, spanning 10, 000 species (classes). 30 abdominal CT scans in the MICCAI 2015 Multi-Atlas Abdomen Labeling Challenge are divided into 18 training cases and 12 test cases. There are 3779 axial contrast-enhanced abdominal CT images in total and the training set contains 2212 axial slices. |
| Hardware Specification | Yes | The experiments are run on TPUv4 4 2 compute resources. We use an A100 40GB GPU for this experiment. |
| Software Dependencies | No | The code to run NLP experiments is developed using Py Torch using Huggingface, Adapter-transformer, PEFT libraries, and the original Lo RA codebase. For Vi T experiments, we use Ja X [5] and the open-sourced JAX implementation of Vi T. This lists software but not specific version numbers for the libraries/frameworks. |
| Experiment Setup | Yes | For all the experiments, we use Adam W optimizer [47] with a warmup ratio of 0.06, a linear learning rate scheduler, and a sequence length of 128. For our methods and the Compacter baseline, we use a batch size of 64. We report the rest of the hyperparameters in Table 4. ... For the image experiments, we use Adam optimizer [33] with 20k max iterations per dataset with a batch size of 64. The learning rate used is 5e-5 except for SVHN where we use a learning rate of 5e-4. |