Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

DartQuant: Efficient Rotational Distribution Calibration for LLM Quantization

Authors: YUANTIAN SHAO, Yuanteng Chen, Peisong Wang, Jianlin Yu, Jing Lin, yiwu yao, Zhihui Wei, Jian Cheng

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In a variety of model quantization experiments, Dart Quant demonstrates superior performance. Compared to existing methods, it achieves 47 acceleration and 10 memory savings for rotational optimization on a 70B model. Furthermore, it is the first to successfully complete rotational calibration for a 70B model on a single 3090 GPU, making quantization of large language models feasible in resource-constrained environments. Code is available at https://github.com/CAS-CLab/Dart Quant.git. 5 Experiment Model and Dataset. We evaluate our method on the Llama series models, including Llama-2 (7B/13B/70B) [1] and Llama-3 (8B/70B). Moreover, we also provide results on two popular Mo E models: Mixtral-8x7B [36] and Deepseek-Mo E [37]. We report perplexity (PPL) scores on the Wiki Text2 [38], C4 [39], and PTB [40]. Additionally, we assess model performance on nine zero-shot evaluation tasks...
Researcher Affiliation Collaboration Yuantian Shao1,2 Yuanteng Chen2,3,4 Peisong Wang2,3 Jianlin Yu5 Jing Lin5 Yiwu Yao5 Zhihui Wei1 Jian Cheng2,3 1Nanjing University of Science and Technology, 2 C2DL, Institute of Automation, Chinese Academy of Sciences, 3School of Artificial Intelligence, University of Chinese Academy of Sciences, 4Zhongguancun Academy, 5Huawei Technologies Co., Ltd.
Pseudocode Yes Algorithm 1 Rotational Distribution Calibration with QR-Orth Optimizer... Algorithm 2 Householder QR Decomposition... Algorithm 3 Cayley SGD with Momentum
Open Source Code Yes Code is available at https://github.com/CAS-CLab/Dart Quant.git.
Open Datasets Yes We report perplexity (PPL) scores on the Wiki Text2 [38], C4 [39], and PTB [40]. Additionally, we assess model performance on nine zero-shot evaluation tasks, including LAMBADA [41], Hella Swag [42], PIQA [43], Wino Grande [44], Open Book QA [45], SIQA [46], MMLU [47], ARC-E, and ARC-C [48].
Dataset Splits Yes In the main results, we apply GPTQ to reconstruct the weights. To do so, we use 128 samples from Wiki Text2, with a sequence length of 2048 tokens, as the calibration set for GPTQ, following the standard GPTQ setup. All activations are quantized using per-token asymmetric quantization. We optimize all orthogonal matrices using SGD combined with QR-Orth. During the orthogonal matrix calibration phase, we use 128 samples from Wiki Text2, each with a token length of 2048.
Hardware Specification Yes Table 3 presents a comparison of the optimization time and memory consumption of Spin Quant, OSTQuant, and Dart Quant on an A800 GPU server. ... Moreover, Dart Quant is the first to optimize the rotation matrix of the 70B model on a single 3090 GPU, with a calibration time of 3 hours.
Software Dependencies No The paper mentions using SGD and Adam optimizers, but does not specify any version numbers for these or other software libraries like PyTorch or TensorFlow.
Experiment Setup Yes Baselines and Implementation Details. In addition to the basic RTN method, we compare our approach with several other methods, including Smooth Quant [23], GPTQ [49], Omni Quant [19], and current state-of-the-art methods such as Quarot [25], Spin Quant [26] and OSTQuant [27] for weight and activation quantization. ... During the orthogonal matrix calibration phase, we use 128 samples from Wiki Text2, each with a token length of 2048. ... The specific hyperparameter settings for Dart Quant are shown in Table 23. It is important to note that the latent parameter Z0 is initialized using a random Hadamard matrix.