Matrix Compression via Randomized Low Rank and Low Precision Factorization
Authors: Rajarshi Saha, Varun Srivastava, Mert Pilanci
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate the efficacy of our algorithm in image compression, nearest neighbor classification of image and text embeddings, and compressing the layers of Lla Ma-7b. Our results illustrate that we can achieve compression ratios as aggressive as one bit per matrix coordinate, all while surpassing or maintaining the performance of traditional compression techniques. |
| Researcher Affiliation | Academia | Rajarshi Saha, Varun Srivastava, Mert Pilanci Department of Electrical Engineering Stanford University Stanford, CA 94305, USA {rajsaha,vsriva,pilanci}@stanford.edu |
| Pseudocode | Yes | Algorithm 1: LPLR: Randomized Low-Precision Low-Rank factorization. ... Algorithm 2: Direct-SVD quant.: Directly quantizing the optimal low-rank factorization. |
| Open Source Code | Yes | Our code is available at https://github.com/pilancilab/matrix-compressor. |
| Open Datasets | Yes | For CIFAR-10 and CIFAR-100, we embed the entire dataset using Mobile Net v3 (Howard et al. [24]) pretrained on Image Net (Deng et al. [12])... The IMDB (mte [2]) dataset consists of 25,000 train and test sentences... The Emotion (mte [1]) dataset is a sentiment analysis dataset, containing 16,000 train and 2000 test sentences... |
| Dataset Splits | No | The paper specifies training and test splits, for example, for CIFAR-10 it states: "The dataset is split into 50,000 training images and 10,000 test images". However, it does not explicitly mention a separate validation split or its size. |
| Hardware Specification | Yes | All experiments were performed on a single GPU NVIDIA TITAN RTX. |
| Software Dependencies | No | The main algorithm is implemented in Pytorch (Paszke et al. [46]), and utilizes Hugging Face [80] implementations of all datasets and large language models. The paper mentions software but does not specify version numbers for PyTorch or Hugging Face implementations. |
| Experiment Setup | Yes | We utilize a uniform bit budget B = B = 8 bits for the quantizers Q, Q across all cases. |