reproducibilityindex.ai

Rethinking Data-Free Quantization as a Zero-Sum Game

Authors: Biao Qian, Yang Wang, Richang Hong, Meng Wang

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The theoretical analysis and empirical studies verify the superiority of Ada SG over the state-of-the-arts. Our code is available at https://github.com/hfutqian/Ada SG.
Researcher Affiliation	Academia	Biao Qian, Yang Wang , Richang Hong, Meng Wang Key Laboratory of Knowledge Engineering with Big Data, Ministry of Education, School of Computer Science and Information Engineering, Hefei University of Technology, China yangwang@hfut.edu.cn, {hfutqian,hongrc.hfut,eric.mengwang}@gmail.com
Pseudocode	No	The paper does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code is available at https://github.com/hfutqian/Ada SG.
Open Datasets	Yes	We validate Ada SG over three typical image classiﬁcation datasets, including CIFAR-10, CIFAR-100 (Krizhevsky 2009) and Image Net (ILSVRC2012) (Russakovsky et al. 2015).
Dataset Splits	Yes	CIFAR-10 and CIFAR-100 contain 10 and 100 classes of images, respectively. Both of them are split into 50K training images and 10K testing images. Image Net consists of 1.2M samples for training and 50k samples for validation with 1000 categories.
Hardware Specification	Yes	All experiments are implemented with pytorch (Paszke et al. 2019) based on the code of GDFQ (Xu et al. 2020) and run on an NVIDIA Ge Force GTX 1080 Ti GPU and an Intel(R) Core(TM) i7-6950X CPU @ 3.00GHz.
Software Dependencies	No	All experiments are implemented with pytorch (Paszke et al. 2019) based on the code of GDFQ (Xu et al. 2020)... The paper mentions 'pytorch' but does not specify a version number for it or any other software dependencies.
Experiment Setup	Yes	For the maximization process, we construct the architecture of the generator (G) following ACGAN (Odena, Olah, and Shlens 2017), while P and Q play the role of discriminator, where G is trained with the loss function Eq.(14) using Adam (Kingma and Ba 2014) as an optimizer with a momentum of 0.9 and a learning rate of 1e-3. For the minimization process, Q is optimized with the loss function Eq. (15), where SGD with Nesterov (Nesterov 1983) is adopted as an optimizer with a momentum of 0.9 and weight decay of 1e-4. For CIFAR, the learning rate is initialized to 1e-4 and decayed by 0.1 for every 100 epochs, while it is 1e-5 (1e-4 for Res Net-50) and divided by 10 at epoch 350 (at epoch 200 and 300 for Res Net-50) on Image Net. The generator and quantized model are totally trained for 400 epochs. The batch size is set to 16. For the hyperparameters, α, β and γ in Eq.(14); λl and λu in Eq.(12) are empirically set to be 0.1, 1, 1, 0.3 and 0.8.