MultiQuant: Training Once for Multi-bit Quantization of Neural Networks
Authors: Ke Xu, Qiantai Feng, Xingyi Zhang, Dong Wang
IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiment results on Image Net datasets demonstrate Multi Quant method can attain the quantization results under different bit-widths comparable with quantization-aware training without retraining. |
| Researcher Affiliation | Academia | 1 School of Artificial Intelligence, Anhui University, Hefei, China 2 Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Hefei, China 3 Institute of Information Science,Beijing Jiaotong University, Beijing, China 4 Beijing Key Laboratory of Advanced Information Science and Network Technology, Beijing, China |
| Pseudocode | No | The paper does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | We present results of using the pre-trained benchmark models by Torch Vision, including Res Net18, Res Net50 [He et al., 2016] and Mobile Net V2 [Sandler et al., 2018] on the Image Net [Deng et al., 2009] dataset. [...] Table 2: Comparison of Multi Quant results with QAT and PTQ under different loss function and bit-width of Res Net-20 on CIFAR-10. |
| Dataset Splits | Yes | Figure 4: Rank correlation between actual accuracy and predicted accuracy on split validation set of Image Net. |
| Hardware Specification | No | The paper mentions 'commodity GPUs and specialized accelerators' in Section 4.3 but does not provide specific hardware models (e.g., GPU names, CPU types, or memory specifications) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'Torch Vision' and 'Adam' but does not specify version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | We train the models for 90 epochs by using Adam [Kingma and Ba, 2015] with a cosine learning rate decay. The batch size is set as 256, base learning rate is set as 0.001 and weight decay of 0. |