Noether’s Learning Dynamics: Role of Symmetry Breaking in Neural Networks
Authors: Hidenori Tanaka, Daniel Kunin
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we empirically validate these predictions by training convolutional neural networks with batch normalization (VGG11) on a large data set (Tiny-Image Net) under various hyperparameter settings as in Fig. 4 (see SI Sec. E). |
| Researcher Affiliation | Collaboration | Hidenori Tanaka1 , Daniel Kunin2 1Physics & Informatics Laboratories, NTT Research, Inc., Sunnyvale, CA, USA 2Stanford University, Stanford, CA, USA |
| Pseudocode | No | The paper does not contain any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements about open-sourcing code or links to a code repository for the described methodology. |
| Open Datasets | Yes | Finally, we empirically validate these predictions by training convolutional neural networks with batch normalization (VGG11) on a large data set (Tiny-Image Net) under various hyperparameter settings as in Fig. 4 (see SI Sec. E). |
| Dataset Splits | No | The paper mentions training on 'Tiny-Image Net' and evaluating 'test accuracy', but does not provide specific details on dataset splits (e.g., percentages, sample counts, or explicit splitting methodology). |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., library names like PyTorch, TensorFlow, or specific solver versions). |
| Experiment Setup | Yes | When trained with a constant small learning rate 0.001, the final test accuracy (A-1) and loss dynamics (A-2) of models with unconstrained filter norm (pink) and constrained filter norm (blue) are almost identical. However, in a large learning rate regime 0.03, the models trained with unconstrained filter norm (pink) outperforms the models trained with constrained filter norm (blue) demonstrating the existence and benefits of implicit adaptive optimization. Similarly, we confirm our prediction that the presence of momentum β 0.9 is essential to amplify the effects of implicit adaptive optimization, as seen in test accuracy (B-1) and loss dynamics (B-2). Finally, we demonstrate the benefits of symmetry breaking due to weight decay k which acts in analogy with the discounting factor for the cumulative gradient norms of RMSProp. Indeed, the final test accuracy (C-1) and the learning dynamics of the loss (C-2) are both enhanced by the presence of weight decay k in agreement with [39]. |