Structured Unrestricted-Rank Matrices for Parameter Efficient Finetuning

Authors: Arijit Sehanobish, Kumar Avinava Dubey, Krzysztof M Choromanski, Somnath Basu Roy Chowdhury, Deepali Jain, Vikas Sindhwani, Snigdha Chaturvedi

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We propose a general framework for parameter efficient fine-tuning using structured unrestricted-rank matrices (SURM), which can serve as a drop-in replacement for popular approaches such as Adapters and Lo RA. ... SURMs achieve 5-7% accuracy gains on various image classification tasks while replacing lowrank matrices in Lo RA. It also results in up to 12x reduction of the number of parameters in adapters (with virtually no loss in quality) on the GLUE benchmark.
Researcher Affiliation Collaboration 1Independent 2 Google Research 3Google Deep Mind 4Columbia University 5UNC Chapel Hill
Pseudocode No The paper describes its methods in text but does not include formal pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github.com/arijitthegame/ structured-matrices-PEFT.
Open Datasets Yes We evaluate SURM on several vision datasets: CIFAR10, CIFAR100 [39], SUN397 [79], DTD [16] and STL10 [17]. We evaluate SURM in low data regime using VTAB-1k datasets [85] and the Vi T model. On Image Net [18] and i Nat2021 [72]... We extensively evaluate SURM models on the GLUE benchmark [74]... using Synapse multi-organ segmentation dataset [82].
Dataset Splits Yes VTAB-1k datasets [85]... only 1000 training examples. We further evaluate the performance of SURM in a large-scale setting using the i Nat2021 dataset [72], which contains over 2.7 million training images, 100K validation images, and 500K test images, spanning 10, 000 species (classes). 30 abdominal CT scans in the MICCAI 2015 Multi-Atlas Abdomen Labeling Challenge are divided into 18 training cases and 12 test cases. There are 3779 axial contrast-enhanced abdominal CT images in total and the training set contains 2212 axial slices.
Hardware Specification Yes The experiments are run on TPUv4 4 2 compute resources. We use an A100 40GB GPU for this experiment.
Software Dependencies No The code to run NLP experiments is developed using Py Torch using Huggingface, Adapter-transformer, PEFT libraries, and the original Lo RA codebase. For Vi T experiments, we use Ja X [5] and the open-sourced JAX implementation of Vi T. This lists software but not specific version numbers for the libraries/frameworks.
Experiment Setup Yes For all the experiments, we use Adam W optimizer [47] with a warmup ratio of 0.06, a linear learning rate scheduler, and a sequence length of 128. For our methods and the Compacter baseline, we use a batch size of 64. We report the rest of the hyperparameters in Table 4. ... For the image experiments, we use Adam optimizer [33] with 20k max iterations per dataset with a batch size of 64. The learning rate used is 5e-5 except for SVHN where we use a learning rate of 5e-4.