AffineQuant: Affine Transformation Quantization for Large Language Models
Authors: Yuexiao Ma, Huixia Li, Xiawu Zheng, Feng Ling, Xuefeng Xiao, Rui Wang, Shilei Wen, Fei Chao, Rongrong Ji
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | As demonstrated in Tables 1 and 3, we observe consistent performance improvements across all models with various quantization configurations. This indicates that Affine Quant is not reliant on a particular quantization configuration. Notably, Affine Quant exhibits significant improvements, particularly in cases of low-bit quantization or smaller model sizes. |
| Researcher Affiliation | Collaboration | Yuexiao Ma1 , Huixia Li2, Xiawu Zheng1,3,4, Feng ling2, Xuefeng Xiao2, Rui Wang2, Shilei Wen2, Fei Chao1, Rongrong Ji1,4 1 Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, School of Informatics, Xiamen University, 361005, P.R. China. 2 Byte Dance Inc. 3 Peng Cheng Laboratory, Shenzhen, China. 4 Institute of Artificial Intelligence, Xiamen University. |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Code is available at: https://github.com/bytedance/AffineQuant |
| Open Datasets | Yes | The model s performance is evaluated on the Wiki Text2 (Merity et al., 2016), PTB (Marcus et al., 1994), and C4 (Raffel et al., 2020) datasets. |
| Dataset Splits | No | The paper mentions selecting '128 segments from the Wiki Text2 training set, each containing 2048 tokens, as the calibration dataset,' but it does not provide explicit training/validation/test splits (e.g., percentages or exact counts) for all datasets used to reproduce the data partitioning. |
| Hardware Specification | Yes | The optimization process is performed on an Nvidia A100 GPU. |
| Software Dependencies | No | The paper mentions using 'PyTorch' and 'MLC-LLM' but does not specify exact version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We leverage the scale of Smooth Quant (Xiao et al., 2023) to initialize the diagonal of the affine transformation matrix. As the affine transformation is orthogonal to the translation operation, we incorporate the optimization of the learnable parameter shift and initialize it using Outlier Suppression+ (Wei et al., 2023). Our optimizer, learning rate, epoch, and learnable clipping of quantization parameters are consistent with Omni Quant (Shao et al., 2023). |