ESPACE: Dimensionality Reduction of Activations for Model Compression

Authors: Charbel Sakr, Brucek Khailany

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we report on experimental studies investigating LLM compression using ESPACE. Accuracy is evaluated in two ways: perplexity measured on the Wikitext-103 dataset [36] and zero-shot downstream task accuracy of: Bool Q (BQ) [37], Hellaswag (HS) [38], PIQA (PQ) [39], RACE (RA) [40], and Wino Grande (WG) [41].
Researcher Affiliation Industry Charbel Sakr NVIDIA Research csakr@nvidia.com Brucek Khailany NVIDIA Research bkhailany@nvidia.com
Pseudocode No The paper does not contain any clearly labeled "Pseudocode" or "Algorithm" blocks.
Open Source Code No As such, we believe the description of the work in the paper is sufficient for reproducibility; yet, we are happy to consider open sourcing our code in the future.
Open Datasets Yes Accuracy is evaluated in two ways: perplexity measured on the Wikitext-103 dataset [36] and zero-shot downstream task accuracy of: Bool Q (BQ) [37], Hellaswag (HS) [38], PIQA (PQ) [39], RACE (RA) [40], and Wino Grande (WG) [41]. ... Retraining simply extends the models pre-training sessions and uses the 330B-token MTNLG dataset [43], which was used to train GPT3 models.
Dataset Splits Yes The Wikitext-103 dataset is split into train, validation, and test sets. We use 512 random sequences from the training set for calibrating projection matrices required by ESPACE. We use the validation set for layer-wise sensitivity studies.
Hardware Specification Yes We measure using a NVIDIA A100 GPU and a simple, un-optimized implementation (see Appendix B.4).
Software Dependencies No Our implementation is built on top of Megatron-LM [33] which itself is based on the Pytorch framework. ... We then use the CUPY library in RAPIDS to perform fast (a few milliseconds per auto-correlation matrix) eigenvalue decomposition on GPUs. (Specific version numbers for these software components are not provided.)
Experiment Setup Yes For GPT3-1.3B, the initial learning rate is set to 1.0 10-4, the final learning rate is set to 1.0 10-5, and the global batch size is set to 512. (Similar details are provided for other models in Appendix B.3).