Learning Strides in Convolutional Neural Networks
Authors: Rachid Riad, Olivier Teboul, David Grangier, Neil Zeghidour
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on audio and image classification show the generality and effectiveness of our solution: we use Diff Stride as a drop-in replacement to standard downsampling layers and outperform them. |
| Researcher Affiliation | Collaboration | Rachid Riad 1, Olivier Teboul2, David Grangier2 & Neil Zeghidour2 1ENS, INRIA, INSERM, UPEC, PSL Research University 2Google Research rachid.riad@ens.fr, {teboul, grangier, neilz}@google.com |
| Pseudocode | Yes | All these steps are summarized by Algorithm 1 and illustrated on a single channel image in the Figure 1. |
| Open Source Code | Yes | We release our implementation of Diff Stride1. 1https://github.com/google-research/diffstride |
| Open Datasets | Yes | Experiments on audio and image classification show the generality and effectiveness of our solution: we use Diff Stride as a drop-in replacement to standard downsampling layers and outperform them. In particular, we show that introducing our layer into a Res Net-18 architecture allows keeping consistent high performance on CIFAR10, CIFAR100 and Image Net even when training starts from poor random stride configurations.CIFAR10 consists of 32 × 32 images labeled in 10 classes with 6000 images per class. We use the official split, with 50,000 images for training and 10,000 images for testing.We also compare the Res Net-18 architectures on the Image Net dataset (Deng et al., 2009), which contains 1,000 classes. The models are trained on the official training split of the Imagenet dataset (1.28M images) and we report our results on the validation set (50k images). |
| Dataset Splits | Yes | CIFAR10 consists of 32 × 32 images labeled in 10 classes with 6000 images per class. We use the official split, with 50,000 images for training and 10,000 images for testing.The models are trained on the official training split of the Imagenet dataset (1.28M images) and we report our results on the validation set (50k images). |
| Hardware Specification | Yes | Table A.2: Per-step time and peak memory usage of Spectral Pooling and Diff Stride relative to strided convolutions, on a V100 GPU. |
| Software Dependencies | Yes | Moreover, we release Tensorflow 2.0 code for training a Pre-Act Res Net-18 with strided convolutions, spectral pooling or Diff Stride on CIFAR10 and CIFAR100, with Diff Stride being implemented as a stand-alone, reusable Keras layer. |
| Experiment Setup | Yes | We train on all datasets with stochastic gradient descent (SGD) (Bottou et al., 1998) with a learning rate of 0.1, a batch size of 256 and a momentum (Qian, 1999) of 0.9. On CIFAR, we train models for 400 epochs dividing the learning rate by 10 at 200 epochs and again by 10 at 300 epochs, with a weight decay of 5.10−3. For CIFAR, we apply random cropping on the input images and left-right random flipping. On Image Net, we train with a weight decay of 1.10−3 for 90 epochs, dividing the learning rate by 10 at epochs 30, 60 and 80. We apply random cropping on the input images as in (Szegedy et al., 2015) and left-right random flipping. |