F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization
Authors: Qing Jin, Jian Ren, Richard Zhuang, Sumant Hanumante, Zhengang Li, Zhiyu Chen, Yanzhi Wang, Kaiyuan Yang, Sergey Tulyakov
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We verify F8Net on Image Net for Mobile Net V1/V2 and Res Net18/50. Our approach achieves comparable and better performance, when compared not only to existing quantization techniques with INT32 multiplication or floating-point arithmetic, but also to the full-precision counterparts, achieving state-of-the-art performance. |
| Researcher Affiliation | Collaboration | Qing Jin1,2 Jian Ren1 Richard Zhuang1 Sumant Hanumante1 Zhengang Li2 Zhiyu Chen3 Yanzhi Wang2 Kaiyuan Yang3 Sergey Tulyakov1 1Snap Inc. 2Northeastern University, USA 3 Rice University, USA |
| Pseudocode | No | The paper describes algorithms and derivations in text and mathematical formulas but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/snap-research/F8Net. |
| Open Datasets | Yes | We verify F8Net on Image Net for Mobile Net V1/V2 and Res Net18/50. |
| Dataset Splits | Yes | We verify F8Net on Image Net for Mobile Net V1/V2 and Res Net18/50. |
| Hardware Specification | Yes | For Res Net18 and Mobile Net V1/V2, we use batch size of 2048 and run the experiments on 8 A100 GPUs. |
| Software Dependencies | No | The paper mentions using PyTorch CV indirectly through a citation but does not provide specific version numbers for software dependencies used in the experiments. |
| Experiment Setup | Yes | For conventional training method, we train the quantized model initialized with a pre-trained full-precision one. The training of full-precision and quantized models shares the same hyperparameters, including learning rate and its scheduler, weight decay, number of epochs, optimizer, and batch size. For Res Net18 and Mobile Net V1, we use an initial learning rate of 0.05, and for Mobile Net V2, it is 0.1. ...150 epochs of training are conducted, with cosine learning rate scheduler without restart. The warmup strategy is adopted with linear increasing (batchsize/256 * 0.05) (Goyal et al., 2017) during the first five epochs before cosine learning rate scheduler. ...we use batch size of 2048... The parameters are updated with SGD optimizer and Nesterov momentum with a momentum weight of 0.9 without damping. |