reproducibilityindex.ai

An Investigation into Neural Net Optimization via Hessian Eigenvalue Density

Authors: Behrooz Ghorbani, Shankar Krishnan, Ying Xiao

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Using this, we study a number of hypotheses concerning smoothness, curvature, and sharpness in the deep learning literature. We then thoroughly analyze a crucial structural feature of the spectra: in non-batch normalized networks, we observe the rapid appearance of large isolated eigenvalues in the spectrum, along with a surprising concentration of the gradient in the corresponding eigenspaces. In batch normalized networks, these two effects are almost absent. We characterize these effects, and explain how they affect optimization speed through both theory and experiments.
Researcher Affiliation	Collaboration	Behrooz Ghorbani 1 2 Shankar Krishnan 2 Ying Xiao 2 1Department of Electrical Engineering, Stanford University. Work was done while author was an intern at Google. 2Machine Perception, Google Inc.. Correspondence to: Behrooz Ghorbani <ghorbani@stanford.edu>.
Pseudocode	Yes	We give the pseudo-code in Algorithm 1, and describe the individual steps below, deferring a discussion of the various approximations to Section 2.2.
Open Source Code	Yes	We believe our tool and style of analysis will open up new avenues of research in optimization, generalization, architecture design etc. So we release our code to the community to accelerate a Hessian based analysis of deep learning.
Open Datasets	Yes	For our analysis, we study a variety of Resnet and VGG (Simonyan & Zisserman, 2014) architectures on both CIFAR-10 and Image Net.
Dataset Splits	No	The paper states it uses CIFAR-10 and Image Net datasets but does not explicitly provide details about the training, validation, or test splits, such as percentages or sample counts.
Hardware Specification	No	The paper mentions 'GPUs' but does not provide specific details on the hardware used, such as exact GPU or CPU models, or memory specifications.
Software Dependencies	No	The paper mentions implementing the algorithm in 'TensorFlow', but does not provide a specific version number for TensorFlow or any other software dependencies.
Experiment Setup	No	The paper refers to 'momentum steps' and 'learning rate decrease (at step 40000)', and states that 'Details are presented in Appendix F', but the main text does not provide specific hyperparameter values like learning rates, batch sizes, or optimizer settings for the neural network training.