reproducibilityindex.ai

Noether’s Learning Dynamics: Role of Symmetry Breaking in Neural Networks

Authors: Hidenori Tanaka, Daniel Kunin

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we empirically validate these predictions by training convolutional neural networks with batch normalization (VGG11) on a large data set (Tiny-Image Net) under various hyperparameter settings as in Fig. 4 (see SI Sec. E).
Researcher Affiliation	Collaboration	Hidenori Tanaka1 , Daniel Kunin2 1Physics & Informatics Laboratories, NTT Research, Inc., Sunnyvale, CA, USA 2Stanford University, Stanford, CA, USA
Pseudocode	No	The paper does not contain any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statements about open-sourcing code or links to a code repository for the described methodology.
Open Datasets	Yes	Finally, we empirically validate these predictions by training convolutional neural networks with batch normalization (VGG11) on a large data set (Tiny-Image Net) under various hyperparameter settings as in Fig. 4 (see SI Sec. E).
Dataset Splits	No	The paper mentions training on 'Tiny-Image Net' and evaluating 'test accuracy', but does not provide specific details on dataset splits (e.g., percentages, sample counts, or explicit splitting methodology).
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., library names like PyTorch, TensorFlow, or specific solver versions).
Experiment Setup	Yes	When trained with a constant small learning rate 0.001, the ﬁnal test accuracy (A-1) and loss dynamics (A-2) of models with unconstrained ﬁlter norm (pink) and constrained ﬁlter norm (blue) are almost identical. However, in a large learning rate regime 0.03, the models trained with unconstrained ﬁlter norm (pink) outperforms the models trained with constrained ﬁlter norm (blue) demonstrating the existence and beneﬁts of implicit adaptive optimization. Similarly, we conﬁrm our prediction that the presence of momentum β 0.9 is essential to amplify the effects of implicit adaptive optimization, as seen in test accuracy (B-1) and loss dynamics (B-2). Finally, we demonstrate the beneﬁts of symmetry breaking due to weight decay k which acts in analogy with the discounting factor for the cumulative gradient norms of RMSProp. Indeed, the ﬁnal test accuracy (C-1) and the learning dynamics of the loss (C-2) are both enhanced by the presence of weight decay k in agreement with [39].