Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers
Authors: Junhan Kim, Chungman Lee, Eulrang Cho, Kyungphil Park, Ho-young Kim, Joonyoung Kim, Yongkweon Jeon
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments on various language models and complexity analysis, we demonstrate that aespa is accurate and efficient in quantizing Transformer models. |
| Researcher Affiliation | Industry | Junhan Kim , Chungman Lee , Eulrang Cho, Kyungphil Park, Ho-young Kim, Joonyoung Kim, Yongkweon Jeon Samsung Research EMAIL |
| Pseudocode | Yes | In Appendix A, we provide the pseudo-code for the proposed aespa excluded in the main text due to the page limitation. Algorithm 1 Quantization |
| Open Source Code | Yes | The code will be available at https://github.com/Samsung Labs/aespa. |
| Open Datasets | Yes | When constructing the calibration dataset, we randomly sample 128 segments consisting of 2048 tokens from the C4 dataset [24] as in [7, 13, 3]. |
| Dataset Splits | No | The paper mentions 'calibration dataset' and 'benchmark datasets (e.g., Wiki Text-2 [22], C4 [24], and PTB [21])' but does not specify explicit training/validation/test splits (e.g., percentages or sample counts) for the evaluation data. |
| Hardware Specification | Yes | Except for the experiments on the LLa MA2 models, which were performed using an NVIDIA H100 GPU, we conducted all experiments using a single NVIDIA A100 GPU (80 GB). |
| Software Dependencies | No | The paper mentions using 'Z-FOLD [13]' and 'Ada Round [23]' for implementing aespa, but does not specify software versions for these or any other libraries/environments (e.g., Python, PyTorch versions). |
| Experiment Setup | Yes | When optimizing a weight-rounding policy, we set the number of iterations, learning rate, and weight of the rounding loss (see λ in (28)) to 2,000, 0.015, and 1.5, respectively. |