OMPQ: Orthogonal Mixed Precision Quantization

Authors: Yuexiao Ma, Taisong Jin, Xiawu Zheng, Yan Wang, Huixia Li, Yongjian Wu, Guannan Jiang, Wei Zhang, Rongrong Ji

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our approach significantly reduces the search time and the required data amount by orders of magnitude, but without a compromise on quantization accuracy. Specifically, we achieve 72.08% Top-1 accuracy on Res Net-18 with 6.7Mb parameters, which does not require any searching iterations. Given the high efficiency and low data dependency of our algorithm, we use it for the posttraining quantization, which achieves 71.27% Top-1 accuracy on Mobile Net V2 with only 1.5Mb parameters.
Researcher Affiliation Collaboration Yuexiao Ma1, Taisong Jin2*, Xiawu Zheng3, Yan Wang4, Huixia Li2, Yongjian Wu5, Guannan Jiang6, Wei Zhang6, Rongrong Ji1 1 MAC Lab, Department of Artificial Intelligence, School of Informatics, Xiamen University, China. 2 MAC Lab, Department of Computer Science and Technology, School of Informatics, Xiamen University, China. 3 Peng Cheng Laboratory, Shenzhen, China. 4 Samsara, Seattle, WA, USA. 5 Tencent Technology (Shanghai) Co., Ltd, China. 6 CATL, China.
Pseudocode No The paper describes its approach procedurally but does not include a formal pseudocode or algorithm block.
Open Source Code No The paper does not provide a specific link to open-source code for the described methodology, nor does it explicitly state that the code is released or available in supplementary materials.
Open Datasets Yes The Image Net dataset includes 1.2M training data and 50,000 validation data.
Dataset Splits Yes The Image Net dataset includes 1.2M training data and 50,000 validation data.
Hardware Specification Yes OMPQ is extremely efficient which only needs a piece of Nvidia Geforce GTX 1080Ti and a single Intel(R) Xeon(R) CPU E5-2620 v4. For the experiments on QAT quantization scheme, we use two NVIDIA Tesla V100 GPUs.
Software Dependencies No The paper mentions using a 'quantization framework' and optimizers like SGD, but does not provide specific version numbers for software libraries or dependencies (e.g., PyTorch version, TensorFlow version).
Experiment Setup Yes In the training process, the initial learning rate is set to 1e-4, and the batch size is set to 128. We use cosine learning rate scheduler and SGD optimizer with 1e-4 weight decay during 90 epochs without distillation. We fix the weight and activation of first and last layer at 8 bit following previous works, where the search space is 4-8 bit.