Leveraging Inter-Layer Dependency for Post -Training Quantization

Authors: changbao wang, DanDan Zheng, Yuanliu Liu, Liang Li

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrates that NWQ outperforms prior state-of-the-art approaches by a large margin: 20.24% for the challenging configuration of Mobile Net V2 with 2 bits on Image Net, pushing extremely low-bit PTQ from feasibility to usability.
Researcher Affiliation Industry Changbao Wang Ant Technology Group Co., Ltd. changbao.wcb@antgroup.com Dandan Zheng Ant Technology Group Co., Ltd. yuandan.zdd@antgroup.com Yuanliu Liu Ant Technology Group Co., Ltd. yuanliu.lyl@antgroup.com Liang Li Ant Technology Group Co., Ltd. double.ll@antgroup.com
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code No Our method is easy to implement based on the baseline codes specified in our paper, and the data is Image Net1K which is publicly available.
Open Datasets Yes We randomly sample 1024 images from Image Net train set and employ Cutmix [45] and Mixup[46] as data augmentation. The learning rates are 0.01 for round policy and 0.0004 for activation quantizer step size. We train for 20000 iterations with a mini-batch size of 32 in 8 Tesla V100 GPUs, taking 30 minutes for Res Net18, which is on par with BRECQ and QDROP. Our experiments are conducted on 5 architectures, including Res Net18(Res18), Mobile Net V2(MNV2), Reg Net-600MF(Reg600MF), Reg Net-3.2GF(Reg3.2GF) and Mnas Net(Mnas). Other settings remain same as QDROP[40] if not specified.
Dataset Splits Yes We report top-1 classification accuracy on the Image Net[9] validation set. Classical Calibration Set. Following previous works, we compare the performance on a small calibration set of 1024 images. For start, we achieve 0.2% 0.9% improvement even compared with strong baselines on W4A4, which are rather close to full precision accuracy. Scaled-up Calibration Set. To further explore the potential of NWQ, we scale up the calibration set by 10 and reproduce QDROP on the scaled-up calibration set, building a very strong baseline.
Hardware Specification Yes We train for 20000 iterations with a mini-batch size of 32 in 8 Tesla V100 GPUs
Software Dependencies No The paper mentions software used ('Our code is based on open-source codes BRECQ and QDrop') but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes The learning rates are 0.01 for round policy and 0.0004 for activation quantizer step size. We train for 20000 iterations with a mini-batch size of 32 in 8 Tesla V100 GPUs, taking 30 minutes for Res Net18, which is on par with BRECQ and QDROP. Our experiments are conducted on 5 architectures, including Res Net18(Res18), Mobile Net V2(MNV2), Reg Net-600MF(Reg600MF), Reg Net-3.2GF(Reg3.2GF) and Mnas Net(Mnas). Other settings remain same as QDROP[40] if not specified.