Finding the Task-Optimal Low-Bit Sub-Distribution in Deep Neural Networks

Authors: Runpei Dong, Zhanhong Tan, Mengdi Wu, Linfeng Zhang, Kaisheng Ma

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Sufficient experiments on image classification and object detection over various modern architectures demonstrate the effectiveness, generalization property, and transferability of the proposed method.
Researcher Affiliation Academia 1Xi an Jiaotong University 2Tsinghua University.
Pseudocode Yes Algorithm 1 Model compression using Differentiable Gaussian Mixture Weight Sharing.
Open Source Code Yes Our codes have been publicly released at https: //github.com/Runpei Dong/DGMS.
Open Datasets Yes Classification experiments are conducted on CIFAR-10 (Krizhevsky, 2009) and Image Net (Deng et al., 2009). PASCAL VOC (Everingham et al., 2015) dataset is used as the detection benchmark. We use VOC2007 plus VOC2012 trainval for training and evaluate on VOC2007 test. ... In Tab. 5, we evaluate the transfer ability with 4-bit Res Net-18 across Image Net, CUB200-2011 (Welinder et al., 2010), Stanford Cars (Krause et al., 2013), and FGVC Aircraft (Maji et al., 2013).
Dataset Splits Yes CUB200-2011 ... 5,994 samples for training and 5,794 images for validation. ... Stanford Cars ... 8,144 samples are used for training and 8,041 images are used for validation. ... FGVC Aircraft ... training set involves 6,667 samples and validation set involves 3,333 images.
Hardware Specification Yes Our evaluation is performed on the octa-core ARM CPU in Qualcomm 888 (Samsung S21). ... The experiments are implemented with Py Torch 1.6 (Paszke et al., 2019) on PH402 SKU 200 with 32G memory GPU devices.
Software Dependencies Yes The experiments are implemented with Py Torch 1.6 (Paszke et al., 2019).
Experiment Setup Yes Optimization We set the batch size as 128 on CIFAR-10 and 256 on Image Net for all the models. SGD with 0.9 momentum and 5 10 4 weight-decay is used during training. All the models are trained for 350 epochs and 60 epochs respectively on CIFAR-10 and Image Net. The max learning rate is set to 2 10 5 (1 10 5 for 2-bit Res Net-50) using one-cycle scheduler (Smith & Topin, 2019) and the initial temperature τ in Eqn. (7) is 0.01 for all the experiments.