reproducibilityindex.ai

Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers

Authors: Junhan Kim, Chungman Lee, Eulrang Cho, Kyungphil Park, Ho-young Kim, Joonyoung Kim, Yongkweon Jeon

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments on various language models and complexity analysis, we demonstrate that aespa is accurate and efficient in quantizing Transformer models.
Researcher Affiliation	Industry	Junhan Kim , Chungman Lee , Eulrang Cho, Kyungphil Park, Ho-young Kim, Joonyoung Kim, Yongkweon Jeon Samsung Research {jun_one.kim, chungman.lee, dragwon.jeon}@samsung.com
Pseudocode	Yes	In Appendix A, we provide the pseudo-code for the proposed aespa excluded in the main text due to the page limitation. Algorithm 1 Quantization
Open Source Code	Yes	The code will be available at https://github.com/Samsung Labs/aespa.
Open Datasets	Yes	When constructing the calibration dataset, we randomly sample 128 segments consisting of 2048 tokens from the C4 dataset [24] as in [7, 13, 3].
Dataset Splits	No	The paper mentions 'calibration dataset' and 'benchmark datasets (e.g., Wiki Text-2 [22], C4 [24], and PTB [21])' but does not specify explicit training/validation/test splits (e.g., percentages or sample counts) for the evaluation data.
Hardware Specification	Yes	Except for the experiments on the LLa MA2 models, which were performed using an NVIDIA H100 GPU, we conducted all experiments using a single NVIDIA A100 GPU (80 GB).
Software Dependencies	No	The paper mentions using 'Z-FOLD [13]' and 'Ada Round [23]' for implementing aespa, but does not specify software versions for these or any other libraries/environments (e.g., Python, PyTorch versions).
Experiment Setup	Yes	When optimizing a weight-rounding policy, we set the number of iterations, learning rate, and weight of the rounding loss (see λ in (28)) to 2,000, 0.015, and 1.5, respectively.