ChannelNets: Compact and Efficient Convolutional Neural Networks via Channel-Wise Convolutions

Authors: Hongyang Gao, Zhengyang Wang, Shuiwang Ji

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on the Image Net dataset demonstrate that Channel Nets achieve consistently better performance compared to prior methods. In this section, we evaluate the proposed Channel Nets on the Image Net ILSVRC 2012 image classification dataset [3], which has served as the benchmark for model compression. We compare different versions of Channel Nets with other compact CNNs. Ablation studies are also conducted to show the effect of group channel-wise convolutions.
Researcher Affiliation Academia Hongyang Gao Texas A&M University College Station, TX hongyang.gao@tamu.edu Zhengyang Wang Texas A&M University College Station, TX zhengyang.wang@tamu.edu Shuiwang Ji Texas A&M University College Station, TX sji@tamu.edu
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks, nor does it explicitly label any section as 'Pseudocode' or 'Algorithm'.
Open Source Code No The paper does not provide any explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets Yes In this section, we evaluate the proposed Channel Nets on the Image Net ILSVRC 2012 image classification dataset [3], which has served as the benchmark for model compression.
Dataset Splits Yes The Image Net ILSVRC 2012 dataset contains 1.2 million training images and 50 thousand validation images. Each image is labeled by one of 1, 000 classes. We follow the same data augmentation process in [5]. Images are scaled to 256 256. Randomly cropped patches with a size of 224 224 are used for training. During inference, 224 224 center crops are fed into the networks. To compare with other compact CNNs [6, 24], we train our models using training images and report accuracies computed on the validation set, since the labels of test images are not publicly available.
Hardware Specification Yes We use 4 TITAN Xp GPUs and a batch size of 512 for training, which takes about 3 days.
Software Dependencies No The paper mentions optimizers and activation functions but does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, or CUDA versions).
Experiment Setup Yes We train our Channel Nets using the same settings as those for Mobile Nets except for a minor change. For depth-wise separable convolutions, we remove the batch normalization and activation function between the depth-wise convolution and the 1 1 convolution. We observe that it has no influence on the performance while accelerating the training speed. For the proposed GCWMs, the kernel size of group channel-wise convolutions is set to 8. In depth-wise separable channel-wise convolutions, we set the kernel size to 64. In the convolutional classification layer, the kernel size of the 3-D convolution is 7 7 25. All models are trained using the stochastic gradient descent optimizer with a momentum of 0.9 for 80 epochs. The learning rate starts at 0.1 and decays by 0.1 at the 45th, 60th, 65th, 70th, and 75th epoch. Dropout [20] with a rate of 0.0001 is applied after 1 1 convolutions. We use 4 TITAN Xp GPUs and a batch size of 512 for training, which takes about 3 days.