Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning Grouped Lattice Vector Quantizers for Low-Bit LLM Compression

Authors: Xi Zhang, Xiaolin Wu, Jiamang Wang, Weisi Lin

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on multiple benchmarks show that our approach achieves a better trade-off between model size and accuracy compared to existing post-training quantization baselines, highlighting its effectiveness in deploying large models under stringent resource constraints.
Researcher Affiliation	Collaboration	1Nanyang Technological University 2Alibaba Group 3Southwest Jiaotong University
Pseudocode	Yes	We present the full pseudocode for GLVQ in Algorithm 1. Starting from initial values G(0) g and µ(0) g , each iteration (i) reshapes the weight block, applies the group-specific µg-law companding, and produces latent vectors Yg; (ii) quantizes these vectors via Babai rounding to obtain integer codes Zg; (iii) reconstructs provisional weights c Wg by inverse companding the lattice outputs; and (iv) minimizes a reconstruction loss augmented with a Frobenius penalty on Gg. Gradients update the generation matrix and curvature parameter, while Zg is implicitly refreshed by Babai rounding at every iteration. The loop stops when the relative loss reduction falls below ε, returning the final compact representation c Wg that combines group-specific lattice precision with adaptive companding.
Open Source Code	Yes	Our source code is available on Git Hub repository: https://github.com/xzhang9308/GLVQ.
Open Datasets	Yes	Our evaluation focuses on perplexity over the Wikitext-2 [37] and C4 [44] datasets, utilizing context lengths of 2048 for Llama 1 and 4096 for Llama 2 models. For zero-shot tasks, we use the LM Eval framework to measure accuracy on tasks such as ARC, PIQA, and the Winograd Schema Challenge (Wino). We adopt 4M tokens from the Red Pajama 1T dataset [57] as the calibration sequences in our experiments.
Dataset Splits	Yes	Our evaluation focuses on perplexity over the Wikitext-2 [37] and C4 [44] datasets, utilizing context lengths of 2048 for Llama 1 and 4096 for Llama 2 models. For zero-shot tasks, we use the LM Eval framework to measure accuracy on tasks such as ARC, PIQA, and the Winograd Schema Challenge (Wino).
Hardware Specification	Yes	We implement our method using Py Torch [42] and CUDA [39], with all experiments conducted on NVIDIA A100 GPUs. For timing experiments, we use an NVIDIA RTX 4090 GPU.
Software Dependencies	Yes	We implement our method using Py Torch [42] and CUDA [39], with all experiments conducted on NVIDIA A100 GPUs. For timing experiments, we use an NVIDIA RTX 4090 GPU. [39] NVIDIA. CUDA Toolkit. https://developer.nvidia.com/cuda-toolkit, 2020. Version 10.2.89.
Experiment Setup	Yes	Specifically, we evaluate perplexity on the Wikitext-2 [37] and C4 [44] datasets, utilizing context lengths of 2048 for Llama 1 and 4096 for Llama 2 models. For zero-shot tasks, we use the LM Eval framework to measure accuracy on tasks such as ARC, PIQA, and the Winograd Schema Challenge (Wino). We adopt 4M tokens from the Red Pajama 1T dataset [57] as the calibration sequences in our experiments. We implement two variants of our model with lattice dimensions d = 8 and d = 32, referred to as GLVQ-8D and GLVQ-32D, respectively.