Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models

Authors: Wenqi Shao, Mengzhao Chen, Zhaoyang Zhang, Peng Xu, Lirui Zhao, Zhiqian Li, Kaipeng Zhang, Peng Gao, Yu Qiao, Ping Luo

ICLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments validate Omni Quant s superior performance across diverse quantization configurations such as W4A4 (4-bit weight, 4-bit activation), W6A6, W4A16, W3A16, and W2A16. Additionally, Omni Quant demonstrates effectiveness in instruction-tuned models and delivers notable improvements in inference speed and memory reduction on real devices.
Researcher Affiliation	Collaboration	1Open GVLab, Shanghai AI Laboratory 2The University of Hong Kong 3The Chinese University of Hong Kong
Pseudocode	Yes	Algorithm 1 Overall algorithm of Omni Quant.
Open Source Code	Yes	Codes are available at https://github.com/Open GVLab/Omni Quant.
Open Datasets	Yes	We employ a calibration dataset consisting of 128 randomly selected 2048-token segments from Wiki Text2 (Merity et al., 2016). Evaluation. Following the previous work (Lin et al., 2023; Frantar et al., 2022), we evaluate quantized models by reporting the perplexity of language generation experiments, specifically on Wiki Text2 (Merity et al., 2016), PTB (Marcus et al., 1994)), C4 (Raffel et al., 2020).
Dataset Splits	No	No explicit validation dataset split is mentioned. The paper uses a 'calibration dataset consisting of 128 randomly selected 2048-token segments from Wiki Text2' for optimizing quantization parameters, and then evaluates on various test datasets.
Hardware Specification	Yes	For instance, the LLa MA-2 model family size 7-70B can be processed with Omni Quant on a single A100-40G GPU within 116 hours using 128 samples. The entire training process is facilitated on a single Nvidia A100 GPU, using a batch size of 1 over 20 epochs. Table,3 shows memory requirements and inference speeds of the LLa MA family on an NVIDIA A100-80G.
Software Dependencies	No	No specific software dependencies with version numbers are listed in the paper.
Experiment Setup	Yes	To optimize the learnable parameters, we utilize the Adam W optimizer with zero weight decay. The learning rate for learnable weight clipping and equivalent transformation is set as 5e 3 and 1e 2, respectively. We employ a calibration dataset consisting of 128 randomly selected 2048-token segments from Wiki Text2 (Merity et al., 2016). The entire training process is facilitated on a single Nvidia A100 GPU, using a batch size of 1 over 20 epochs, except for W2A16 quantization that leverages 40 epochs.