An Investigation into Neural Net Optimization via Hessian Eigenvalue Density

Authors: Behrooz Ghorbani, Shankar Krishnan, Ying Xiao

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using this, we study a number of hypotheses concerning smoothness, curvature, and sharpness in the deep learning literature. We then thoroughly analyze a crucial structural feature of the spectra: in non-batch normalized networks, we observe the rapid appearance of large isolated eigenvalues in the spectrum, along with a surprising concentration of the gradient in the corresponding eigenspaces. In batch normalized networks, these two effects are almost absent. We characterize these effects, and explain how they affect optimization speed through both theory and experiments.
Researcher Affiliation Collaboration Behrooz Ghorbani 1 2 Shankar Krishnan 2 Ying Xiao 2 1Department of Electrical Engineering, Stanford University. Work was done while author was an intern at Google. 2Machine Perception, Google Inc.. Correspondence to: Behrooz Ghorbani <ghorbani@stanford.edu>.
Pseudocode Yes We give the pseudo-code in Algorithm 1, and describe the individual steps below, deferring a discussion of the various approximations to Section 2.2.
Open Source Code Yes We believe our tool and style of analysis will open up new avenues of research in optimization, generalization, architecture design etc. So we release our code to the community to accelerate a Hessian based analysis of deep learning.
Open Datasets Yes For our analysis, we study a variety of Resnet and VGG (Simonyan & Zisserman, 2014) architectures on both CIFAR-10 and Image Net.
Dataset Splits No The paper states it uses CIFAR-10 and Image Net datasets but does not explicitly provide details about the training, validation, or test splits, such as percentages or sample counts.
Hardware Specification No The paper mentions 'GPUs' but does not provide specific details on the hardware used, such as exact GPU or CPU models, or memory specifications.
Software Dependencies No The paper mentions implementing the algorithm in 'TensorFlow', but does not provide a specific version number for TensorFlow or any other software dependencies.
Experiment Setup No The paper refers to 'momentum steps' and 'learning rate decrease (at step 40000)', and states that 'Details are presented in Appendix F', but the main text does not provide specific hyperparameter values like learning rates, batch sizes, or optimizer settings for the neural network training.