reproducibilityindex.ai

AffineQuant: Affine Transformation Quantization for Large Language Models

Authors: Yuexiao Ma, Huixia Li, Xiawu Zheng, Feng Ling, Xuefeng Xiao, Rui Wang, Shilei Wen, Fei Chao, Rongrong Ji

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	As demonstrated in Tables 1 and 3, we observe consistent performance improvements across all models with various quantization configurations. This indicates that Affine Quant is not reliant on a particular quantization configuration. Notably, Affine Quant exhibits significant improvements, particularly in cases of low-bit quantization or smaller model sizes.
Researcher Affiliation	Collaboration	Yuexiao Ma1 , Huixia Li2, Xiawu Zheng1,3,4, Feng ling2, Xuefeng Xiao2, Rui Wang2, Shilei Wen2, Fei Chao1, Rongrong Ji1,4 1 Key Laboratory of Multimedia Trusted Perception and Efficient Computing, Ministry of Education of China, School of Informatics, Xiamen University, 361005, P.R. China. 2 Byte Dance Inc. 3 Peng Cheng Laboratory, Shenzhen, China. 4 Institute of Artificial Intelligence, Xiamen University.
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	Code is available at: https://github.com/bytedance/AffineQuant
Open Datasets	Yes	The model s performance is evaluated on the Wiki Text2 (Merity et al., 2016), PTB (Marcus et al., 1994), and C4 (Raffel et al., 2020) datasets.
Dataset Splits	No	The paper mentions selecting '128 segments from the Wiki Text2 training set, each containing 2048 tokens, as the calibration dataset,' but it does not provide explicit training/validation/test splits (e.g., percentages or exact counts) for all datasets used to reproduce the data partitioning.
Hardware Specification	Yes	The optimization process is performed on an Nvidia A100 GPU.
Software Dependencies	No	The paper mentions using 'PyTorch' and 'MLC-LLM' but does not specify exact version numbers for these or other software dependencies.
Experiment Setup	Yes	We leverage the scale of Smooth Quant (Xiao et al., 2023) to initialize the diagonal of the affine transformation matrix. As the affine transformation is orthogonal to the translation operation, we incorporate the optimization of the learnable parameter shift and initialize it using Outlier Suppression+ (Wei et al., 2023). Our optimizer, learning rate, epoch, and learnable clipping of quantization parameters are consistent with Omni Quant (Shao et al., 2023).