Exploiting Cyclic Symmetry in Convolutional Neural Networks

Authors: Sander Dieleman, Jeffrey De Fauw, Koray Kavukcuoglu

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the effect of these architectural modifications on three datasets which exhibit rotational symmetry and demonstrate improved performance with smaller models. ... 6. Experiments ... 6.1. Datasets ... 6.2. Experimental setup ... Table 2. Number of model parameters and results on the plankton dataset (cross-entropy, lower is better).
Researcher Affiliation Industry Sander Dieleman SEDIELEM@GOOGLE.COM Jeffrey De Fauw DEFAUW@GOOGLE.COM Koray Kavukcuoglu KORAYK@GOOGLE.COM Google Deep Mind
Pseudocode No The paper describes operations and provides a table (Table 1) summarizing them, but it does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes A fast GPU implementation of the rolling operation for Theano (using CUDA kernels) is available at https://github.com/benanne/kaggle-ndsb.
Open Datasets Yes The Plankton dataset (Cowen et al., 2015) consists of 30,336 grayscale images... The Galaxies dataset consists of 61,578 colour images... The Massachusetts buildings dataset (Mnih, 2013) consists of 1500 1500 aerial images...
Dataset Splits Yes We split this set into separate validation and training sets of 3,037 and 27,299 images respectively. [Plankton] ... We split the dataset into a validation set of 6,157 images and a training set of 55,421 images. [Galaxies] ... it was split into a training set of 137 images, a validation set of 4 images and a test set of 10 images. [Massachusetts buildings]
Hardware Specification No The paper mentions 'A fast GPU implementation' but does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for running experiments.
Software Dependencies No The paper mentions 'Theano (using CUDA kernels)' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes We use the Adam optimisation method (Kingma & Ba, 2014) for all experiments, because it allows us to avoid retuning learning rates when cyclic layers are inserted. We use discrete learning rate schedules with tenfold decreases near the end of training, following Krizhevsky et al. (2012). For the plankton dataset we also use weight decay for additional regularisation. We use data augmentation to reduce overfitting, including random rotation between 0 and 360 . ... We reduced the batch size used for training by a factor of 4.