reproducibilityindex.ai

Finding the Task-Optimal Low-Bit Sub-Distribution in Deep Neural Networks

Authors: Runpei Dong, Zhanhong Tan, Mengdi Wu, Linfeng Zhang, Kaisheng Ma

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Sufﬁcient experiments on image classiﬁcation and object detection over various modern architectures demonstrate the effectiveness, generalization property, and transferability of the proposed method.
Researcher Affiliation	Academia	1Xi an Jiaotong University 2Tsinghua University.
Pseudocode	Yes	Algorithm 1 Model compression using Differentiable Gaussian Mixture Weight Sharing.
Open Source Code	Yes	Our codes have been publicly released at https: //github.com/Runpei Dong/DGMS.
Open Datasets	Yes	Classiﬁcation experiments are conducted on CIFAR-10 (Krizhevsky, 2009) and Image Net (Deng et al., 2009). PASCAL VOC (Everingham et al., 2015) dataset is used as the detection benchmark. We use VOC2007 plus VOC2012 trainval for training and evaluate on VOC2007 test. ... In Tab. 5, we evaluate the transfer ability with 4-bit Res Net-18 across Image Net, CUB200-2011 (Welinder et al., 2010), Stanford Cars (Krause et al., 2013), and FGVC Aircraft (Maji et al., 2013).
Dataset Splits	Yes	CUB200-2011 ... 5,994 samples for training and 5,794 images for validation. ... Stanford Cars ... 8,144 samples are used for training and 8,041 images are used for validation. ... FGVC Aircraft ... training set involves 6,667 samples and validation set involves 3,333 images.
Hardware Specification	Yes	Our evaluation is performed on the octa-core ARM CPU in Qualcomm 888 (Samsung S21). ... The experiments are implemented with Py Torch 1.6 (Paszke et al., 2019) on PH402 SKU 200 with 32G memory GPU devices.
Software Dependencies	Yes	The experiments are implemented with Py Torch 1.6 (Paszke et al., 2019).
Experiment Setup	Yes	Optimization We set the batch size as 128 on CIFAR-10 and 256 on Image Net for all the models. SGD with 0.9 momentum and 5 10 4 weight-decay is used during training. All the models are trained for 350 epochs and 60 epochs respectively on CIFAR-10 and Image Net. The max learning rate is set to 2 10 5 (1 10 5 for 2-bit Res Net-50) using one-cycle scheduler (Smith & Topin, 2019) and the initial temperature τ in Eqn. (7) is 0.01 for all the experiments.