NOLA: Compressing LoRA using Linear Combination of Random Basis
Authors: Soroush Abbasi Koohpayegani, Navaneet K L, Parsa Nooralinejad, Soheil Kolouri, Hamed Pirsiavash
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present adaptation results using GPT-2, LLa MA-2, and Vi T in natural language and computer vision tasks. NOLA performs as well as Lo RA models with much fewer number of parameters compared to Lo RA with rank one, the best compression Lo RA can archive. Particularly, on LLa MA-2 70B, our method is almost 20 times more compact than the most compressed Lo RA without degradation in accuracy. |
| Researcher Affiliation | Academia | 1University of California, Davis 2 Vanderbilt University |
| Pseudocode | No | The paper does not contain a figure, block, or section explicitly labeled 'Pseudocode' or 'Algorithm', nor does it present structured steps formatted like code or an algorithm. |
| Open Source Code | Yes | Our code is available here: https://github.com/UCDvision/NOLA |
| Open Datasets | Yes | Datasets: We utilize the following datasets for our Natural Language Generation (NLG) task: E2E NLG Challenge (Novikova et al., 2017) serves as a commonly used benchmark for evaluating NLG models. DART (Nan et al., 2020) is yet another significant dataset employed for evaluating text-to-data generation. Web NLG (Gardent et al., 2017) is a text-to-data dataset... We use CIFAR10 (Krizhevsky et al., 2014), CIFAR100 (Krizhevsky et al., 2009), CUB-200-2011 (Welinder et al., 2010), Caltech-101 (Fei-Fei et al., 2004), Aircraft (Maji et al., 2013), Food101 (Bossard et al., 2014), Pets (Parkhi et al., 2012) and SUN397 (Xiao et al., 2010) datasets for finetuning. |
| Dataset Splits | Yes | Datasets: We utilize the following datasets for our Natural Language Generation (NLG) task: E2E NLG Challenge (Novikova et al., 2017) serves as a commonly used benchmark for evaluating NLG models. It encompasses of 51,200 samples, distributed as follows: 42,200 for training, 4,600 for validation, and an additional 4,600 for testing. |
| Hardware Specification | Yes | Implementation Details: We trained our models using a single NVIDIA RTX 6000 Ada Generation GPU. (...) We optimize for one epoch on the Alpaca dataset with a batch size of 256 using four RTX 3090 GPUs. |
| Software Dependencies | No | The paper mentions software like the 'Timm library' and 'Adam optimizer' but does not provide specific version numbers for any of its software dependencies. |
| Experiment Setup | Yes | Implementation Details: We train our models for 5 epochs with a learning rate of 0.1 and no weight decay. We use a batch size of 8. We use a rank of 8 for NOLA in our experiments. Like Lo RA, we scale A B with c/r, where c is a hyperparameter and r is the rank. We use the default value of c = 1. |