Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Binary Quadratic Quantization: Beyond First-Order Quantization for Real-Valued Matrix Compression
Authors: Kyo Kuroki, Yasuyuki Okoshi, Thiem Van Chu, Kazushi Kawamura, Masato Motomura
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our approach with two experiments: a matrix compression benchmark and post-training quantization (PTQ) on pretrained Vision Transformer-based models. Experimental results demonstrate that BQQ consistently achieves a superior trade-off between memory efficiency and reconstruction error than conventional methods for compressing diverse matrix data. It also delivers strong PTQ performance, even though we neither target state-of-the-art PTQ accuracy under tight memory constraints nor rely on PTQ-specific binary matrix optimization. |
| Researcher Affiliation | Academia | Kyo Kuroki Institute of Science Tokyo EMAIL Yasuyuki Okoshi Institute of Science Tokyo EMAIL Thiem Van Chu Institute of Science Tokyo EMAIL Kazushi Kawamura Waseda University EMAIL Masato Motomura Institute of Science Tokyo EMAIL |
| Pseudocode | Yes | Algorithm 1 One Iteration of AMFD [36] Algorithm 2 Subproblem Solving via AMFD Algorithm 3 Greedy Binary Quadratic Quantization |
| Open Source Code | No | The code is not currently publicly available. We plan to release it after publication for reproducing the main results. |
| Open Datasets | Yes | Matrix Data Compression We evaluate the trade-off between approximation error and memory size across five types of real-valued matrices: (i) a random matrix sampled from a Gaussian distribution, (ii) a weight matrix from the Dei T-S model [59], (iii) an inter-city distance matrix from the TSPLIB dataset [55], (iv) a matrix composed of multiple 128-dimensional feature vectors extracted from the SIFT dataset [28], commonly used in ANN search, (v) a red channel matrix of an image from the Image Net dataset [10]. Each matrix is standardized to have zero mean and a variance of one prior to quantization. Table 1: Comparison of Image Net top-1 accuracy across various quantization methods on Vi Ts. Table S.3: Wiki Text-2 perplexity and downstream task accuracy. |
| Dataset Splits | Yes | Correction of Bias and Normalization Parameters After quantizing all weight matrices, we optionally apply a lightweight correction step using a small set of unlabeled calibration inputs. Similar to [5], we refine only the bias and layer normalization parameters while keeping all other parameters fixed by minimizing the mean squared error between the output logits of the original forg and quantized models fq, as a form of knowledge distillation: minθ forg(θorg) fq(θ) 2 2 / |forg(θorg)| , where θ denotes the bias and normalization parameters. This correction step compensates for quantization-induced errors and helps recover lost accuracy without requiring full fine-tuning or access to labeled data. ...calibration data are randomly selected from the Image Net [10] training dataset, with 2048 samples for Dei Ts and 1024 samples for Swins, in accordance with [68] setting. For both t-BQQ and GPTQ, the calibration data consist of the full training split of Wiki Text-2. |
| Hardware Specification | Yes | All experiments were conducted using the following environment: Python 3.9.19 Py Torch 2.6.0 with CUDA 12.4 Four NVIDIA Ge Force RTX 4090 GPUs AMD EPYC 7313 16-Core Processor |
| Software Dependencies | Yes | All experiments were conducted using the following environment: Python 3.9.19 Py Torch 2.6.0 with CUDA 12.4 Four NVIDIA Ge Force RTX 4090 GPUs AMD EPYC 7313 16-Core Processor |
| Experiment Setup | Yes | Implementation Details As described in Eq. (6), BQQ decomposes a real-valued matrix of size m n into binary matrices Yi {0, 1}m l and Zi {0, 1}l n. To ensure a fair comparison with baseline methods like UQ and BCQ, we fix the intermediate dimension l = round(mn/(m + n)) for all binary matrices. This ensures the total number of binary parameters matches that of UQ and BCQ, making p in Eq. (6) the pseudo bit width. Another way to match the number of binary parameters is to adjust the ratio between the intermediate dimension l and the number of stacks p in Eq. (6); however, this paper adopts the approach described above. Unless otherwise noted, the hyperparameters used in Alg. 3 are set to the following values throughout all experiments: Tinit = 0.2, Tfin = 0.005, η = 0.06, ζ = 4, and Nstep = 50,000. Also, the scaling factor and the bias for UQ are optimized via grid search to minimize the mean squared error (MSE), as described in App. A.2. For BCQ, we implement the method based on [62], referring to parts of the open-source code provided in [65]. In the case of bias and normalization parameter correction (denoted as c-UQ, c-BCQ, and c-BQQ for each quantization method), we optimize them using the Adam optimizer [34] with a learning rate of 0.001 for 15 epochs via a minibatch size of 16, and calibration data are randomly selected from the Image Net [10] training dataset, with 2048 samples for Dei Ts and 1024 samples for Swins, in accordance with [68] setting. |