Steerable Partial Differential Operators for Equivariant Neural Networks

Authors: Erik Jenner, Maurice Weiler

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We use our solutions as equivariant drop-in replacements for convolutional layers and benchmark them in that role. Finally, we test our approach empirically by comparing steerable PDOs to steerable CNNs. In particular, we benchmark different discretization methods for the numerical implementation. Table 1: MNIST-rot results. Test errors standard deviations are averaged over six runs. Table 2: STL-10 results, again over six runs.
Researcher Affiliation Academia Erik Jenner University of Amsterdam erik@ejenner.com Maurice Weiler University of Amsterdam m.weiler.ml@gmail.com
Pseudocode No No pseudocode or algorithm block found.
Open Source Code Yes We have also implemented steerable PDOs for all subgroups of O(2) (https://github.com/ ejnnr/steerable_pdos). Our code extends the E2CNN library1 (Weiler & Cesa, 2019), which will allow practitioners to easily use both steerable kernels and steerable PDOs within the same library, and even to combine both inside the same network. The code necessary to reproduce our experiments can be found at https://github.com/ ejnnr/steerable_pdo_experiments.
Open Datasets Yes Rotated MNIST We first benchmark steerable PDOs on rotated MNIST (Larochelle et al., 2007), which consists of MNIST images that have been rotated by different angles, with 12k train and 50k test images. STL-10 The rotated MNIST dataset has global rotational symmetry by design, so it is unsurprising that equivariant models perform well. But interestingly, rotation equivariance can also help for natural images without global rotational symmetry (Weiler & Cesa, 2019; Shen et al., 2020). We therefore benchmark steerable PDOs on STL-10 (Coates et al., 2011), where we only use the labeled portion of 5000 training images. We use the dataset provided by Ribeiro et al. (2020).
Dataset Splits Yes For the final training runs, we used the entire set of 12k training plus validation images, as is common practice on MNIST-rot.
Hardware Specification Yes We performed our experiments on an internal cluster with a Ge Force RTX 2080 Ti and 6 CPU cores.
Software Dependencies No The paper mentions "Adam (Kingma & Ba, 2015)", "Wide-Res Net-16-8 (Zagoruyko & Komodakis, 2016)", "Cutout (De Vries & Taylor, 2017)", and the "E2CNN library" but does not specify version numbers for these software components or programming languages used.
Experiment Setup Yes All models were trained with 30 epochs and hyperparameters based on those by Weiler & Cesa (2019), though we changed the learning rate schedule and regularization slightly because this improved performance for all models, including kernel-based ones. The training data is augmented with random rotations. Precise descriptions of the architecture and hyperparameters can be found in Appendix O. We trained all MNIST-rot models for 30 epochs with Adam (Kingma & Ba, 2015) and a batch size of 64. The training data was normalized and augmented using random rotations. The initial learning rate was 0.05, which was decayed exponentially after a burn-in of 5 epochs at a rate of 0.7 per epoch. We used a dropout of 0.5 after the fully connected layer, and a weight decay of 1e-7.