Rethinking Data-Free Quantization as a Zero-Sum Game
Authors: Biao Qian, Yang Wang, Richang Hong, Meng Wang
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The theoretical analysis and empirical studies verify the superiority of Ada SG over the state-of-the-arts. Our code is available at https://github.com/hfutqian/Ada SG. |
| Researcher Affiliation | Academia | Biao Qian, Yang Wang , Richang Hong, Meng Wang Key Laboratory of Knowledge Engineering with Big Data, Ministry of Education, School of Computer Science and Information Engineering, Hefei University of Technology, China yangwang@hfut.edu.cn, {hfutqian,hongrc.hfut,eric.mengwang}@gmail.com |
| Pseudocode | No | The paper does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/hfutqian/Ada SG. |
| Open Datasets | Yes | We validate Ada SG over three typical image classification datasets, including CIFAR-10, CIFAR-100 (Krizhevsky 2009) and Image Net (ILSVRC2012) (Russakovsky et al. 2015). |
| Dataset Splits | Yes | CIFAR-10 and CIFAR-100 contain 10 and 100 classes of images, respectively. Both of them are split into 50K training images and 10K testing images. Image Net consists of 1.2M samples for training and 50k samples for validation with 1000 categories. |
| Hardware Specification | Yes | All experiments are implemented with pytorch (Paszke et al. 2019) based on the code of GDFQ (Xu et al. 2020) and run on an NVIDIA Ge Force GTX 1080 Ti GPU and an Intel(R) Core(TM) i7-6950X CPU @ 3.00GHz. |
| Software Dependencies | No | All experiments are implemented with pytorch (Paszke et al. 2019) based on the code of GDFQ (Xu et al. 2020)... The paper mentions 'pytorch' but does not specify a version number for it or any other software dependencies. |
| Experiment Setup | Yes | For the maximization process, we construct the architecture of the generator (G) following ACGAN (Odena, Olah, and Shlens 2017), while P and Q play the role of discriminator, where G is trained with the loss function Eq.(14) using Adam (Kingma and Ba 2014) as an optimizer with a momentum of 0.9 and a learning rate of 1e-3. For the minimization process, Q is optimized with the loss function Eq. (15), where SGD with Nesterov (Nesterov 1983) is adopted as an optimizer with a momentum of 0.9 and weight decay of 1e-4. For CIFAR, the learning rate is initialized to 1e-4 and decayed by 0.1 for every 100 epochs, while it is 1e-5 (1e-4 for Res Net-50) and divided by 10 at epoch 350 (at epoch 200 and 300 for Res Net-50) on Image Net. The generator and quantized model are totally trained for 400 epochs. The batch size is set to 16. For the hyperparameters, α, β and γ in Eq.(14); λl and λu in Eq.(12) are empirically set to be 0.1, 1, 1, 0.3 and 0.8. |