On the Symmetries of Deep Learning Models and their Internal Representations

Authors: Charles Godfrey, Davis Brown, Tegan Emerson, Henry Kvinge

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our work suggests that the symmetries of a network are propagated into the symmetries in that network s representation of data, providing us with a better understanding of how architecture affects the learning and prediction process. Finally, we speculate that for Re LU networks, the intertwiner groups may provide a justification for the common practice of concentrating model interpretability exploration on the activation basis in hidden layers rather than arbitrary linear combinations thereof.
Researcher Affiliation Collaboration 1Pacific Northwest National Laboratory, 2Department of Mathematics, University of Washington, 3Department of Mathematics, Colorado State University, 4Department of Mathematical Sciences, University of Texas, El Paso
Pseudocode No The paper refers to an 'algorithm that compute Gσn' in Section 3.1, but no pseudocode block or clearly labeled algorithm section is present in the main paper or appendices.
Open Source Code No We are in the process of making code publicly available.
Open Datasets Yes We conduct experiments stitching networks at Re LU activation layers with the stitching layer restricted to elements of the group GRe LU showing in fig. 1 that one can stitch CNNs on CIFAR-10 [Kri09]... ...We compare three models trained on Image Net
Dataset Splits No The paper mentions using 'validation accuracy' and 'lowest validation loss' (Section 4 and Appendix D.1), implying a validation set was used. However, it only explicitly details a 50,000 training images and 10,000 test images split for CIFAR-10 (Appendix D.1) and does not provide specific details on how the validation set was created or its size.
Hardware Specification Yes All models were trained on NVIDIA A100 GPUs.
Software Dependencies No The paper states: 'All models were trained using PyTorch [Pas+19].' (Appendix D.1). However, it does not provide a specific version number for PyTorch or any other software dependencies, which is required for reproducibility.
Experiment Setup Yes We train our models with SGD (for Myrtle CNNs) and Adam (for Res Net20s), using the same hyperparameters as described in [Pag18] and [Kri09] respectively. We use a batch size of 128 and run each experiment for 50 epochs.