Investigating how ReLU-networks encode symmetries

Authors: Georg Bökman, Fredrik Kahl

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform quantitative experiments with VGG-nets on CIFAR10 and qualitative experiments with Res Nets on Image Net to illustrate and support our theoretical findings. These experiments are not only of interest for understanding how group equivariance is encoded in Re LU-networks, but they also give a new perspective on Entezari et al. s permutation conjecture as we find that it is typically easier to merge a network with a group-transformed version of itself than merging two different networks.
Researcher Affiliation Academia Georg Bökman Fredrik Kahl Chalmers University of Technology {bokman, fredrik.kahl}@chalmers.se
Pseudocode No The paper presents mathematical derivations and theoretical concepts but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes We provide code for merging networks with their flipped selfs at https://github.com/georg-bn/layerwise-equivariance.
Open Datasets Yes We train VGG11 nets [33] on CIFAR10 [23]. Next we look at the GCNN barrier for Res Net50 [21] trained on Image Net [11].
Dataset Splits No The paper mentions using CIFAR10 and Image Net datasets but does not explicitly provide details about specific training, validation, and test splits (e.g., percentages, sample counts, or explicit mention of a validation set).
Hardware Specification Yes The nets were trained on NVIDIA T4 GPUs on a computing cluster. We train on NVIDIA A100 GPUs on a computing cluster.
Software Dependencies No The paper mentions using Torchvision (implying PyTorch) but does not specify version numbers for any software dependencies like Python, PyTorch, or CUDA.
Experiment Setup No The paper describes the loss functions used (invariance loss and cross-entropy), training strategy (e.g., 'trained with horizontal flipping data augmentation', 'trained using C.-E. and inv-loss after 20% of the epochs'), and number of nets trained ('train 24 VGG11 nets for each model type') but does not provide specific numerical hyperparameters such as learning rates, batch sizes, or exact total epoch counts.