Finding the Task-Optimal Low-Bit Sub-Distribution in Deep Neural Networks
Authors: Runpei Dong, Zhanhong Tan, Mengdi Wu, Linfeng Zhang, Kaisheng Ma
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Sufficient experiments on image classification and object detection over various modern architectures demonstrate the effectiveness, generalization property, and transferability of the proposed method. |
| Researcher Affiliation | Academia | 1Xi an Jiaotong University 2Tsinghua University. |
| Pseudocode | Yes | Algorithm 1 Model compression using Differentiable Gaussian Mixture Weight Sharing. |
| Open Source Code | Yes | Our codes have been publicly released at https: //github.com/Runpei Dong/DGMS. |
| Open Datasets | Yes | Classification experiments are conducted on CIFAR-10 (Krizhevsky, 2009) and Image Net (Deng et al., 2009). PASCAL VOC (Everingham et al., 2015) dataset is used as the detection benchmark. We use VOC2007 plus VOC2012 trainval for training and evaluate on VOC2007 test. ... In Tab. 5, we evaluate the transfer ability with 4-bit Res Net-18 across Image Net, CUB200-2011 (Welinder et al., 2010), Stanford Cars (Krause et al., 2013), and FGVC Aircraft (Maji et al., 2013). |
| Dataset Splits | Yes | CUB200-2011 ... 5,994 samples for training and 5,794 images for validation. ... Stanford Cars ... 8,144 samples are used for training and 8,041 images are used for validation. ... FGVC Aircraft ... training set involves 6,667 samples and validation set involves 3,333 images. |
| Hardware Specification | Yes | Our evaluation is performed on the octa-core ARM CPU in Qualcomm 888 (Samsung S21). ... The experiments are implemented with Py Torch 1.6 (Paszke et al., 2019) on PH402 SKU 200 with 32G memory GPU devices. |
| Software Dependencies | Yes | The experiments are implemented with Py Torch 1.6 (Paszke et al., 2019). |
| Experiment Setup | Yes | Optimization We set the batch size as 128 on CIFAR-10 and 256 on Image Net for all the models. SGD with 0.9 momentum and 5 10 4 weight-decay is used during training. All the models are trained for 350 epochs and 60 epochs respectively on CIFAR-10 and Image Net. The max learning rate is set to 2 10 5 (1 10 5 for 2-bit Res Net-50) using one-cycle scheduler (Smith & Topin, 2019) and the initial temperature τ in Eqn. (7) is 0.01 for all the experiments. |