PowerQuant: Automorphism Search for Non-Uniform Quantization

Authors: Edouard YVINEC, Arnaud Dapogny, Matthieu Cord, Kevin Bailly

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 EXPERIMENTS In this section, we empirically validate our method. First, we discuss the optimization of the exponent parameter a of Power Quant using the reconstruction error, showing its interest as a proxy for the quantized model accuracy from an experimental standpoint. We show that the proposed approach preserves this reconstruction error significantly better, allowing a closer fit to the original weight distribution through non-uniform quantization. Second, we show through a variety of benchmarks that the proposed approach significantly outperforms state-of-the-art data-free methods, thanks to more efficient power function quantization with optimized exponent. Third, we show that the proposed approach comes at a negligible cost in term of inference speed.
Researcher Affiliation Collaboration Sorbonne Universit e1, CNRS, ISIR, f-75005, 4 Place Jussieu 75005 Paris, France Datakalab2, 114 boulevard Malesherbes, 75017 Paris, France
Pseudocode Yes Algorithm 1 Weight Quantization Algorithm
Open Source Code No The paper does not provide an explicit statement or link for open-source code for the described methodology.
Open Datasets Yes We validate the proposed Power Quant method on Image Net classification (Deng et al., 2009) ( 1.2M images train/50k test).
Dataset Splits No The paper states '1.2M images train/50k test' for ImageNet, but does not explicitly provide details for a validation dataset split.
Hardware Specification Yes Table 17: Inference time, in seconds, over Image Net using batches of size 16 of several networks on a 2070 RTX GPU.
Software Dependencies No The paper mentions using 'Tensorflow implementations' and 'Numpy library' but does not specify version numbers for these or other software dependencies.
Experiment Setup No The paper mentions 'batches of size 16' and describes some quantization settings (unsigned integers for activations, symmetric representation for weights, batch-normalization folding), but it does not provide specific hyperparameter values such as learning rate, number of epochs, or optimizer settings.