Learning Strides in Convolutional Neural Networks

Authors: Rachid Riad, Olivier Teboul, David Grangier, Neil Zeghidour

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on audio and image classification show the generality and effectiveness of our solution: we use Diff Stride as a drop-in replacement to standard downsampling layers and outperform them.
Researcher Affiliation Collaboration Rachid Riad 1, Olivier Teboul2, David Grangier2 & Neil Zeghidour2 1ENS, INRIA, INSERM, UPEC, PSL Research University 2Google Research rachid.riad@ens.fr, {teboul, grangier, neilz}@google.com
Pseudocode Yes All these steps are summarized by Algorithm 1 and illustrated on a single channel image in the Figure 1.
Open Source Code Yes We release our implementation of Diff Stride1. 1https://github.com/google-research/diffstride
Open Datasets Yes Experiments on audio and image classification show the generality and effectiveness of our solution: we use Diff Stride as a drop-in replacement to standard downsampling layers and outperform them. In particular, we show that introducing our layer into a Res Net-18 architecture allows keeping consistent high performance on CIFAR10, CIFAR100 and Image Net even when training starts from poor random stride configurations.CIFAR10 consists of 32 × 32 images labeled in 10 classes with 6000 images per class. We use the official split, with 50,000 images for training and 10,000 images for testing.We also compare the Res Net-18 architectures on the Image Net dataset (Deng et al., 2009), which contains 1,000 classes. The models are trained on the official training split of the Imagenet dataset (1.28M images) and we report our results on the validation set (50k images).
Dataset Splits Yes CIFAR10 consists of 32 × 32 images labeled in 10 classes with 6000 images per class. We use the official split, with 50,000 images for training and 10,000 images for testing.The models are trained on the official training split of the Imagenet dataset (1.28M images) and we report our results on the validation set (50k images).
Hardware Specification Yes Table A.2: Per-step time and peak memory usage of Spectral Pooling and Diff Stride relative to strided convolutions, on a V100 GPU.
Software Dependencies Yes Moreover, we release Tensorflow 2.0 code for training a Pre-Act Res Net-18 with strided convolutions, spectral pooling or Diff Stride on CIFAR10 and CIFAR100, with Diff Stride being implemented as a stand-alone, reusable Keras layer.
Experiment Setup Yes We train on all datasets with stochastic gradient descent (SGD) (Bottou et al., 1998) with a learning rate of 0.1, a batch size of 256 and a momentum (Qian, 1999) of 0.9. On CIFAR, we train models for 400 epochs dividing the learning rate by 10 at 200 epochs and again by 10 at 300 epochs, with a weight decay of 5.10−3. For CIFAR, we apply random cropping on the input images and left-right random flipping. On Image Net, we train with a weight decay of 1.10−3 for 90 epochs, dividing the learning rate by 10 at epochs 30, 60 and 80. We apply random cropping on the input images as in (Szegedy et al., 2015) and left-right random flipping.