Decoupled Convolutions for CNNs

Authors: Guotian Xie, Ting Zhang, Kuiyuan Yang, Jianhuang Lai, Jingdong Wang

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our approach achieves comparable classification performance with the standard uncoupled convolution, but with a smaller model size over CIFAR-100, CIFAR-10 and Image Net. Experiments Datasets We use three datasets to demonstrate our network. The first is the benchmark Image Net dataset (ILSVRC2012) (Russakovsky et al. 2015) that consists of 1, 000 classes.
Researcher Affiliation Collaboration Guotian Xie,1,2 Ting Zhang,4 Kuiyuan Yang,3 Jianhuang Lai,1,2 Jingdong Wang4 1School of Data and Computer Science, Sun Yat-Sen University 2Guangdong Province Key Laboratory of Information Security 3Deep Motion, 4Microsoft Research xieguotian1990@gmail.com,{Ting.Zhang, jingdw}@microsoft.com kuiyuanyang@deepmotion.ai, stsljh@mail.sysu.edu.cn
Pseudocode Yes Algorithm 1: The training process of Ml
Open Source Code No The paper does not provide any concrete access to source code.
Open Datasets Yes We use three datasets to demonstrate our network. The first is the benchmark Image Net dataset (ILSVRC2012) (Russakovsky et al. 2015) that consists of 1, 000 classes. Image Net contains over 1.2 million training images and 50, 000 validation images. For testing, we report the top-1 accuracy of center crop of the validation dataset of Image Net. The results reported are the best performance of model during training. The other two are CIFAR-100 dataset, which contains 50000 training images and 10000 test images, each labeled with 100 classes and CIFAR-10 dataset, which also consists of 50000 training images and 10000 test images, each labeled with 10 classes.
Dataset Splits Yes Image Net contains over 1.2 million training images and 50, 000 validation images. For testing, we report the top-1 accuracy of center crop of the validation dataset of Image Net.
Hardware Specification No No specific hardware details (like GPU or CPU models) used for experiments were found.
Software Dependencies No The paper mentions 'We implement our model based on Caffe (Jia et al. 2014)' but does not provide specific version numbers for Caffe or other software dependencies.
Experiment Setup Yes For the classification task of 1000 classes of Image Net, we train all the models for 500, 000 iterations with batch size 256. For CIFAR-100 and CIFAR-10, we train 180, 000 iterations with batch size 64. The weight decay is set as 0.0001 and the momentum is 0.9. we set the initial learning rate as 0.1 and divided the learning rate by 10 for each 150, 000 iterations on Image Net and for each 50, 000 iterations on CIFAR. On Image Net, we use multi-scale (randomly resizing the image to scale within range [256,480]) and randomly crop with randomly horizontal mirroring for data augmentation. We initialize the weights with the MSRA initialization techniques introduced in (He et al. 2015) and train the model from scratch. We train all models by SGD with Nesterov momentum.