Equivariance-aware Architectural Optimization of Neural Networks

Authors: Kaitlin Maile, Dennis George Wilson, Patrick Forré

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments across a variety of datasets show the benefit of dynamically constrained equivariance to find effective architectures with approximate equivariance.
Researcher Affiliation Academia Kaitlin Maile IRIT, University of Toulouse kaitlin.maile@irit.fr Dennis G. Wilson ISAE-SUPAERO, University of Toulouse dennis.wilson@isae-supaero.fr Patrick Forr e University of Amsterdam p.d.forre@uva.nl
Pseudocode Yes Algorithm 1 Evolutionary equivariance-aware neural architecture search. procedure EQUINASE(Initial symmetry group G) and Algorithm 2 Differentiable equivariance-aware neural architecture search. procedure EQUINASD(Set of groups [G])
Open Source Code No The paper does not provide an explicit statement about releasing source code for the described methodology or a link to a code repository.
Open Datasets Yes The Rotated MNIST dataset (Larochelle et al., 2007, Rot MNIST) is a version of the MNIST handwritten digit dataset but with the images rotated by any angle. The Galaxy10 DECals dataset (Leung & Bovy, 2019, Galaxy10) contains galaxy images in 10 broad categories. The ISIC 2019 dataset (Codella et al., 2018; Tschandl et al., 2018; Combalia et al., 2019, ISIC) contains dermascopic images of 8 types of skin cancer plus a null class.
Dataset Splits Yes For Rot MNIST and MNIST, we use the standard training and test splits with a batch size of 64, reserving 10% of the training data as the validation set. For Galaxy10, we set aside 10% of the dataset as the test set, reserving 10% of the remaining training data as the validation set. For ISIC, we set aside 10% of the available training dataset as the test set, reserving 10% of the remaining data as the validation set and the rest as training data.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running experiments.
Software Dependencies No The paper mentions using SGD and Adam optimizers but does not provide specific version numbers for any key software components or libraries (e.g., PyTorch, TensorFlow, Python version).
Experiment Setup Yes The learning rates were selected by grid search over baselines on Rot MNIST. For all experiments in Sections 6.1, we use a simple SGD optimizer with learning rate 0.1 to avoid confounding effects such as momentum during the morphism. For Equi NASE, the parent selection size is 5, the training time per generation is 0.5 epochs, and the number of generations is 50 for all tasks. ... For all experiments in Section 6.2, we use separate Adam optimizers for Ψ and Z, each with a learning rate of 0.01 and otherwise default settings. The total training time is 100 epochs for Rot MNIST and 50 epochs for Galaxy10 and ISIC.