Matrix Compression via Randomized Low Rank and Low Precision Factorization

Authors: Rajarshi Saha, Varun Srivastava, Mert Pilanci

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate the efficacy of our algorithm in image compression, nearest neighbor classification of image and text embeddings, and compressing the layers of Lla Ma-7b. Our results illustrate that we can achieve compression ratios as aggressive as one bit per matrix coordinate, all while surpassing or maintaining the performance of traditional compression techniques.
Researcher Affiliation Academia Rajarshi Saha, Varun Srivastava, Mert Pilanci Department of Electrical Engineering Stanford University Stanford, CA 94305, USA {rajsaha,vsriva,pilanci}@stanford.edu
Pseudocode Yes Algorithm 1: LPLR: Randomized Low-Precision Low-Rank factorization. ... Algorithm 2: Direct-SVD quant.: Directly quantizing the optimal low-rank factorization.
Open Source Code Yes Our code is available at https://github.com/pilancilab/matrix-compressor.
Open Datasets Yes For CIFAR-10 and CIFAR-100, we embed the entire dataset using Mobile Net v3 (Howard et al. [24]) pretrained on Image Net (Deng et al. [12])... The IMDB (mte [2]) dataset consists of 25,000 train and test sentences... The Emotion (mte [1]) dataset is a sentiment analysis dataset, containing 16,000 train and 2000 test sentences...
Dataset Splits No The paper specifies training and test splits, for example, for CIFAR-10 it states: "The dataset is split into 50,000 training images and 10,000 test images". However, it does not explicitly mention a separate validation split or its size.
Hardware Specification Yes All experiments were performed on a single GPU NVIDIA TITAN RTX.
Software Dependencies No The main algorithm is implemented in Pytorch (Paszke et al. [46]), and utilizes Hugging Face [80] implementations of all datasets and large language models. The paper mentions software but does not specify version numbers for PyTorch or Hugging Face implementations.
Experiment Setup Yes We utilize a uniform bit budget B = B = 8 bits for the quantizers Q, Q across all cases.