Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Compressing Large Language Models using Low Rank and Low Precision Decomposition
Authors: Rajarshi Saha, Naomi Sagan, Varun Srivastava, Andrea Goldsmith, Mert Pilanci
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Results illustrate that compressing Lla Ma-2 7B/13B/70B and Lla Ma-3 8B models using CALDERA outperforms existing post-training LLM compression techniques in the regime of less than 2.5 bits per parameter. |
| Researcher Affiliation | Academia | Rajarshi Saha Stanford University Naomi Sagan Stanford University Varun Srivastava Stanford University Andrea J. Goldsmith Princeton University Mert Pilanci Stanford University |
| Pseudocode | Yes | Algorithm 1: CALDERA: Calibration Aware Low-Precision DEcomposition with Low-Rank Adaptation; Algorithm 2: LPLRFACTORIZE(A, k, X, QL, QR, Tin): LPLR factorization submodule |
| Open Source Code | Yes | The implementation is available at: https://github.com/pilancilab/caldera. |
| Open Datasets | Yes | The performance of CALDERA is evaluated using perplexity on the test splits of the Wikitext2 [25] and C4 [6] datasets, as well as task-specific goodness-of-fit metrics such as zeroshot accuracy for sequence classification. Specifically, zero-shot accuracy was measured on the Winogrande [19], RTE [1, 40], Pi QA [2], ARC-Easy, and ARC-Challenge [4] tasks. |
| Dataset Splits | Yes | The calibration dataset is 256 samples in total, with 192 data points in the training split and 64 in the evaluation split. |
| Hardware Specification | Yes | Experiments were performed on either NVIDIA RTX A6000, NVIDIA A10G, or NVIDIA H100 GPUs. |
| Software Dependencies | No | The paper mentions "Py Torch" and "Hugging Face implementations" but does not specify their version numbers, which is required for a reproducible description of ancillary software. |
| Experiment Setup | Yes | For all CALDERA decompositions, the number of alternating iterations between updating Q and L, R (i.e., Tout in Alg. 1) is 15. For decompositions with quantized low-rank factors, except LLa Ma-2 7B and LLa Ma-3 8B, the number of LPLR iterations (i.e., Tin in Alg. 2) is 10. For LLa Ma-2 7B and LLa Ma-3 8B, the number of LPLR iterations is 50. ... RHT fine-tuning was performed for 5 epochs with a learning rate of 10-3. ... Table 8: Hyperparameter settings for low-rank adaptation*. Batch size refers to the per-device batch size. All fine-tuning experiments are parallelized across four GPUs. |