An Investigation into Neural Net Optimization via Hessian Eigenvalue Density
Authors: Behrooz Ghorbani, Shankar Krishnan, Ying Xiao
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using this, we study a number of hypotheses concerning smoothness, curvature, and sharpness in the deep learning literature. We then thoroughly analyze a crucial structural feature of the spectra: in non-batch normalized networks, we observe the rapid appearance of large isolated eigenvalues in the spectrum, along with a surprising concentration of the gradient in the corresponding eigenspaces. In batch normalized networks, these two effects are almost absent. We characterize these effects, and explain how they affect optimization speed through both theory and experiments. |
| Researcher Affiliation | Collaboration | Behrooz Ghorbani 1 2 Shankar Krishnan 2 Ying Xiao 2 1Department of Electrical Engineering, Stanford University. Work was done while author was an intern at Google. 2Machine Perception, Google Inc.. Correspondence to: Behrooz Ghorbani <ghorbani@stanford.edu>. |
| Pseudocode | Yes | We give the pseudo-code in Algorithm 1, and describe the individual steps below, deferring a discussion of the various approximations to Section 2.2. |
| Open Source Code | Yes | We believe our tool and style of analysis will open up new avenues of research in optimization, generalization, architecture design etc. So we release our code to the community to accelerate a Hessian based analysis of deep learning. |
| Open Datasets | Yes | For our analysis, we study a variety of Resnet and VGG (Simonyan & Zisserman, 2014) architectures on both CIFAR-10 and Image Net. |
| Dataset Splits | No | The paper states it uses CIFAR-10 and Image Net datasets but does not explicitly provide details about the training, validation, or test splits, such as percentages or sample counts. |
| Hardware Specification | No | The paper mentions 'GPUs' but does not provide specific details on the hardware used, such as exact GPU or CPU models, or memory specifications. |
| Software Dependencies | No | The paper mentions implementing the algorithm in 'TensorFlow', but does not provide a specific version number for TensorFlow or any other software dependencies. |
| Experiment Setup | No | The paper refers to 'momentum steps' and 'learning rate decrease (at step 40000)', and states that 'Details are presented in Appendix F', but the main text does not provide specific hyperparameter values like learning rates, batch sizes, or optimizer settings for the neural network training. |