F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization

Authors: Qing Jin, Jian Ren, Richard Zhuang, Sumant Hanumante, Zhengang Li, Zhiyu Chen, Yanzhi Wang, Kaiyuan Yang, Sergey Tulyakov

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We verify F8Net on Image Net for Mobile Net V1/V2 and Res Net18/50. Our approach achieves comparable and better performance, when compared not only to existing quantization techniques with INT32 multiplication or floating-point arithmetic, but also to the full-precision counterparts, achieving state-of-the-art performance.
Researcher Affiliation Collaboration Qing Jin1,2 Jian Ren1 Richard Zhuang1 Sumant Hanumante1 Zhengang Li2 Zhiyu Chen3 Yanzhi Wang2 Kaiyuan Yang3 Sergey Tulyakov1 1Snap Inc. 2Northeastern University, USA 3 Rice University, USA
Pseudocode No The paper describes algorithms and derivations in text and mathematical formulas but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/snap-research/F8Net.
Open Datasets Yes We verify F8Net on Image Net for Mobile Net V1/V2 and Res Net18/50.
Dataset Splits Yes We verify F8Net on Image Net for Mobile Net V1/V2 and Res Net18/50.
Hardware Specification Yes For Res Net18 and Mobile Net V1/V2, we use batch size of 2048 and run the experiments on 8 A100 GPUs.
Software Dependencies No The paper mentions using PyTorch CV indirectly through a citation but does not provide specific version numbers for software dependencies used in the experiments.
Experiment Setup Yes For conventional training method, we train the quantized model initialized with a pre-trained full-precision one. The training of full-precision and quantized models shares the same hyperparameters, including learning rate and its scheduler, weight decay, number of epochs, optimizer, and batch size. For Res Net18 and Mobile Net V1, we use an initial learning rate of 0.05, and for Mobile Net V2, it is 0.1. ...150 epochs of training are conducted, with cosine learning rate scheduler without restart. The warmup strategy is adopted with linear increasing (batchsize/256 * 0.05) (Goyal et al., 2017) during the first five epochs before cosine learning rate scheduler. ...we use batch size of 2048... The parameters are updated with SGD optimizer and Nesterov momentum with a momentum weight of 0.9 without damping.