Learning Layer-wise Equivariances Automatically using Gradients

Authors: Tycho van der Ouderaa, Alexander Immer, Mark van der Wilk

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the ability to automatically learn layer-wise equivariances on image classification tasks, achieving equivalent or improved performance over baselines with hard-coded symmetry. We demonstrate automatically learning layer-wise symmetry structure on image classification tasks. 6 Experiments 6.1 Toy problem: adapting symmetry to task 6.2 Learning to use layer-wise equivariant convolutions on CIFAR-10 6.3 Selecting symmetry from multiple groups
Researcher Affiliation Academia 1Department of Computing, Imperial College London, United Kingdom 2Department of Computer Science, ETH Zurich, Switzerland 3Max Planck Institute for Intelligent Systems, Tübingen, Germany 4Department of Computer Science, University of Oxford, United Kingdom
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks clearly labeled as such.
Open Source Code Yes Code accompanying this paper is available at https://github.com/tychovdo/ella
Open Datasets Yes We slightly altered the MNIST dataset [Le Cun et al., 1989] so that digits are randomly placed in one of four image quadrants... Convolutional layers provide useful inductive bias for image classification tasks, such as CIFAR-10 [Krizhevsky et al., 2009].
Dataset Splits No The paper mentions 'training data' and 'validation data' in a general context but does not provide specific details on the dataset splits (e.g., percentages, counts, or methodology) used for training, validation, and testing to reproduce the experiments.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers, such as programming languages, libraries, or frameworks used for implementation.
Experiment Setup Yes We follow Finzi et al. [2021a] and use the architecture of Neyshabur [2020] (see App. F). In Table 2, we compare test performance when using fully-connected FC, convolutional CONV, residual pathways FC+CONV, and the proposed more parameter-efficient factorisations F-FC and F-FC+CONV and sparsified Slayers. ... We compensate for the added parameters resulting from the rotationally equivariant GCONV path by reducing channel sizes of individual paths by a factor of 5 (α=10 to α=2, see App. F).