Investigating how ReLU-networks encode symmetries
Authors: Georg Bökman, Fredrik Kahl
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform quantitative experiments with VGG-nets on CIFAR10 and qualitative experiments with Res Nets on Image Net to illustrate and support our theoretical findings. These experiments are not only of interest for understanding how group equivariance is encoded in Re LU-networks, but they also give a new perspective on Entezari et al. s permutation conjecture as we find that it is typically easier to merge a network with a group-transformed version of itself than merging two different networks. |
| Researcher Affiliation | Academia | Georg Bökman Fredrik Kahl Chalmers University of Technology {bokman, fredrik.kahl}@chalmers.se |
| Pseudocode | No | The paper presents mathematical derivations and theoretical concepts but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | We provide code for merging networks with their flipped selfs at https://github.com/georg-bn/layerwise-equivariance. |
| Open Datasets | Yes | We train VGG11 nets [33] on CIFAR10 [23]. Next we look at the GCNN barrier for Res Net50 [21] trained on Image Net [11]. |
| Dataset Splits | No | The paper mentions using CIFAR10 and Image Net datasets but does not explicitly provide details about specific training, validation, and test splits (e.g., percentages, sample counts, or explicit mention of a validation set). |
| Hardware Specification | Yes | The nets were trained on NVIDIA T4 GPUs on a computing cluster. We train on NVIDIA A100 GPUs on a computing cluster. |
| Software Dependencies | No | The paper mentions using Torchvision (implying PyTorch) but does not specify version numbers for any software dependencies like Python, PyTorch, or CUDA. |
| Experiment Setup | No | The paper describes the loss functions used (invariance loss and cross-entropy), training strategy (e.g., 'trained with horizontal flipping data augmentation', 'trained using C.-E. and inv-loss after 20% of the epochs'), and number of nets trained ('train 24 VGG11 nets for each model type') but does not provide specific numerical hyperparameters such as learning rates, batch sizes, or exact total epoch counts. |