Split to Be Slim: An Overlooked Redundancy in Vanilla Convolution

Authors: Qiulin Zhang, Zhuqing Jiang, Qishuo Lu, Jia'nan Han, Zhengxin Zeng, Shanghua Gao, Aidong Men

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To show the effectiveness of the proposed SPConv, in this section, we conduct experiments with only the widely-used 3 3 kernels being replaced by our SPConv modules.
Researcher Affiliation Academia Qiulin Zhang1 , Zhuqing Jiang1 , Qishuo Lu1 , Jia nan Han1 , Zhengxin Zeng1 , Shang-Hua Gao2 , Aidong Men1 1Beijing University of Posts and Telecommunications 2Nankai University {qiulinzhang, jiangzhuqing, hanjianan, zengzhengxinsice, menad}@bupt.edu.cn , shgao@mail.nankai.edu.cn
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks (e.g., clearly labeled algorithm sections or code-like formatted procedures).
Open Source Code No The paper does not provide concrete access to its source code, nor does it explicitly state that the code for its methodology is released or available.
Open Datasets Yes Firstly, we perform small scale image classification experiments on the CIFAR-10 dataset [Krizhevsky et al., 2009] with Res Net-20 [He et al., 2016] and VGG-16 [Simonyan and Zisserman, 2015] architectures. Then we experiment a large scale 1000-class single label classification task on Image Net-2012 [Deng et al., 2009] with Res Net-50 [He et al., 2016] architecture. To explore SPConv s generality further, we also conduct a multi-label object detection experiment on MS COCO dataset [Lin et al., 2014b].
Dataset Splits Yes For fair comparisons, all models in each experiment, including reimplemented baselines and SPConv-equipped models, are trained from scratch on 4 NVIDIA Tesla V100 GPUs with the default data augmentation and training strategy which are optimized for vanilla convolution and no other tricks are used. Therefore, our proposed SPConv may achieve better performance with extensive hyper-parameter searches. More ablation studies are performed on small scale CIFAR-10 dataset. ... models are trained on the COCO trainval35k set and tested on the left 5K minival set.
Hardware Specification Yes For fair comparisons, all models in each experiment, including reimplemented baselines and SPConv-equipped models, are trained from scratch on 4 NVIDIA Tesla V100 GPUs with the default data augmentation and training strategy which are optimized for vanilla convolution and no other tricks are used. ... Inference time is tested on a single NVIDIA Tesla V100 with NVIDIA DALI as data pipelines.
Software Dependencies No The paper mentions "NVIDIA DALI project" and "apex [Micikevicius et al., 2018]" and "mmdection [Chen et al., 2019a]", but does not provide specific version numbers for any of these software components.
Experiment Setup Yes Optimization is performed using SGD with weight decay = 5e-4, batch-size = 128, initial learning rate = 0.1 which is divided by 10 every 50 epochs. ... With the default settings, the learning rate starts at 0.1 and decays by a factor of 10 every 30 epochs, using synchronous SGD with weight decay 1e-4, momentum 0.9 and a mini-batch of 256 to train the model from scratch for 90 epochs.