Noether’s Learning Dynamics: Role of Symmetry Breaking in Neural Networks

Authors: Hidenori Tanaka, Daniel Kunin

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we empirically validate these predictions by training convolutional neural networks with batch normalization (VGG11) on a large data set (Tiny-Image Net) under various hyperparameter settings as in Fig. 4 (see SI Sec. E).
Researcher Affiliation Collaboration Hidenori Tanaka1 , Daniel Kunin2 1Physics & Informatics Laboratories, NTT Research, Inc., Sunnyvale, CA, USA 2Stanford University, Stanford, CA, USA
Pseudocode No The paper does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about open-sourcing code or links to a code repository for the described methodology.
Open Datasets Yes Finally, we empirically validate these predictions by training convolutional neural networks with batch normalization (VGG11) on a large data set (Tiny-Image Net) under various hyperparameter settings as in Fig. 4 (see SI Sec. E).
Dataset Splits No The paper mentions training on 'Tiny-Image Net' and evaluating 'test accuracy', but does not provide specific details on dataset splits (e.g., percentages, sample counts, or explicit splitting methodology).
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., library names like PyTorch, TensorFlow, or specific solver versions).
Experiment Setup Yes When trained with a constant small learning rate 0.001, the final test accuracy (A-1) and loss dynamics (A-2) of models with unconstrained filter norm (pink) and constrained filter norm (blue) are almost identical. However, in a large learning rate regime 0.03, the models trained with unconstrained filter norm (pink) outperforms the models trained with constrained filter norm (blue) demonstrating the existence and benefits of implicit adaptive optimization. Similarly, we confirm our prediction that the presence of momentum β 0.9 is essential to amplify the effects of implicit adaptive optimization, as seen in test accuracy (B-1) and loss dynamics (B-2). Finally, we demonstrate the benefits of symmetry breaking due to weight decay k which acts in analogy with the discounting factor for the cumulative gradient norms of RMSProp. Indeed, the final test accuracy (C-1) and the learning dynamics of the loss (C-2) are both enhanced by the presence of weight decay k in agreement with [39].