Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers
Authors: Junhan Kim, Chungman Lee, Eulrang Cho, Kyungphil Park, Ho-young Kim, Joonyoung Kim, Yongkweon Jeon
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments on various language models and complexity analysis, we demonstrate that aespa is accurate and efficient in quantizing Transformer models. |
| Researcher Affiliation | Industry | Junhan Kim , Chungman Lee , Eulrang Cho, Kyungphil Park, Ho-young Kim, Joonyoung Kim, Yongkweon Jeon Samsung Research {jun_one.kim, chungman.lee, dragwon.jeon}@samsung.com |
| Pseudocode | Yes | In Appendix A, we provide the pseudo-code for the proposed aespa excluded in the main text due to the page limitation. Algorithm 1 Quantization |
| Open Source Code | Yes | The code will be available at https://github.com/Samsung Labs/aespa. |
| Open Datasets | Yes | When constructing the calibration dataset, we randomly sample 128 segments consisting of 2048 tokens from the C4 dataset [24] as in [7, 13, 3]. |
| Dataset Splits | No | The paper mentions 'calibration dataset' and 'benchmark datasets (e.g., Wiki Text-2 [22], C4 [24], and PTB [21])' but does not specify explicit training/validation/test splits (e.g., percentages or sample counts) for the evaluation data. |
| Hardware Specification | Yes | Except for the experiments on the LLa MA2 models, which were performed using an NVIDIA H100 GPU, we conducted all experiments using a single NVIDIA A100 GPU (80 GB). |
| Software Dependencies | No | The paper mentions using 'Z-FOLD [13]' and 'Ada Round [23]' for implementing aespa, but does not specify software versions for these or any other libraries/environments (e.g., Python, PyTorch versions). |
| Experiment Setup | Yes | When optimizing a weight-rounding policy, we set the number of iterations, learning rate, and weight of the rounding loss (see λ in (28)) to 2,000, 0.015, and 1.5, respectively. |