Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

LittleBit: Ultra Low-Bit Quantization via Latent Factorization

Authors: Banseok Lee, Dongkyu Kim, Youngcheon You, Youngmin Kim

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments confirm the superiority of LITTLEBIT in the sub-1-bit domain; for instance, our method at 0.1 BPW surpasses the performance of leading techniques operating at 0.7 BPW on Llama2-7B. We establish a new sizeperformance trade-off unlocking a potential 11.6 inference speedup relative to FP16 and render powerful LLMs practical for resource-constrained environments.
Researcher Affiliation Industry Banseok Lee Dongkyu Kim Youngcheon You Youngmin Kim Samsung Research EMAIL
Pseudocode No The paper describes the methodology in Section 3, detailing the LITTLEBIT architecture, Dual-SVID Initialization, and Residual Compensation using mathematical formulations and descriptive text, but does not include a distinct pseudocode or algorithm block.
Open Source Code Yes Our code is available at https://github.com/Samsung Labs/Little Bit.
Open Datasets Yes Perplexity (PPL) on the Wiki Text-2 [52] validation dataset served as the primary performance metric. Appendix C provides additional results on the C4 [53] and PTB [54] datasets.
Dataset Splits Yes Evaluation Setup We evaluated LITTLEBIT across diverse LLM families, including Llama [46], Llama2 [47], Llama3 [48], OPT [49], Phi-4 [50], and Qw Q [51]. These models span parameter scales from 1.3B to 32B. Perplexity (PPL) on the Wiki Text-2 [52] validation dataset served as the primary performance metric. Appendix C provides additional results on the C4 [53] and PTB [54] datasets. Training Details Adhering to the protocol in [24], the training data combined Wiki Text-2 with selected partitions from C4.
Hardware Specification Yes Training was conducted using four H100 GPUs for all models except Qw Q-32B, which required 4×8 A100 GPUs.
Software Dependencies No The paper mentions the use of an Adam optimizer [63] and a custom CUDA kernel, but does not provide specific version numbers for any software libraries or frameworks like Python, PyTorch, or CUDA.
Experiment Setup Yes Training Details We optimized the LITTLEBIT model parameters, initialized via the Dual-SVID method (Section 3.2), using QAT with knowledge distillation (KD) [61, 36, 62]. The original pretrained full-precision model functioned as the teacher (T ) for the LITTLEBIT student model (S). The QAT objective combines the standard output Kullback-Leibler (KL) divergence loss, Lout, and an intermediate layer mean squared error (MSE) loss, Linter, to match hidden representations. We weighted these terms using an empirically determined coefficient λ = 10: LQAT = Lout + λLinter. (11) Adhering to the protocol in [24], the training data combined Wiki Text-2 with selected partitions from C4. The configuration included a sequence length of 2048 tokens, 5 epochs, the Adam optimizer (β1 = 0.9, β2 = 0.999) [63], and a cosine learning rate decay with 2% warm-up (see Appendix G for details).