On the Symmetries of Deep Learning Models and their Internal Representations
Authors: Charles Godfrey, Davis Brown, Tegan Emerson, Henry Kvinge
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our work suggests that the symmetries of a network are propagated into the symmetries in that network s representation of data, providing us with a better understanding of how architecture affects the learning and prediction process. Finally, we speculate that for Re LU networks, the intertwiner groups may provide a justification for the common practice of concentrating model interpretability exploration on the activation basis in hidden layers rather than arbitrary linear combinations thereof. |
| Researcher Affiliation | Collaboration | 1Pacific Northwest National Laboratory, 2Department of Mathematics, University of Washington, 3Department of Mathematics, Colorado State University, 4Department of Mathematical Sciences, University of Texas, El Paso |
| Pseudocode | No | The paper refers to an 'algorithm that compute Gσn' in Section 3.1, but no pseudocode block or clearly labeled algorithm section is present in the main paper or appendices. |
| Open Source Code | No | We are in the process of making code publicly available. |
| Open Datasets | Yes | We conduct experiments stitching networks at Re LU activation layers with the stitching layer restricted to elements of the group GRe LU showing in fig. 1 that one can stitch CNNs on CIFAR-10 [Kri09]... ...We compare three models trained on Image Net |
| Dataset Splits | No | The paper mentions using 'validation accuracy' and 'lowest validation loss' (Section 4 and Appendix D.1), implying a validation set was used. However, it only explicitly details a 50,000 training images and 10,000 test images split for CIFAR-10 (Appendix D.1) and does not provide specific details on how the validation set was created or its size. |
| Hardware Specification | Yes | All models were trained on NVIDIA A100 GPUs. |
| Software Dependencies | No | The paper states: 'All models were trained using PyTorch [Pas+19].' (Appendix D.1). However, it does not provide a specific version number for PyTorch or any other software dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | We train our models with SGD (for Myrtle CNNs) and Adam (for Res Net20s), using the same hyperparameters as described in [Pag18] and [Kri09] respectively. We use a batch size of 128 and run each experiment for 50 epochs. |