reproducibilityindex.ai

Neglected Hessian component explains mysteries in sharpness regularization

Authors: Yann N. Dauphin, Atish Agarwala, Hossein Mobahi

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide empirical and theoretical evidence that the NME is important to the performance of gradient penalties and explains their sensitivity to activation functions. and We conducted experimental studies to answer this question in the context of curvature regularization algorithms which seek to promote convergence to flat areas of the loss landscape.
Researcher Affiliation	Industry	Yann N. Dauphin Google Deep Mind ynd@google.com Atish Agarwala Google Deep Mind thetish@google.com Hossein Mobahi Google Deep Mind hmobahi@google.com
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks. Appendix C.2 contains actual code snippets in JAX, not pseudocode.
Open Source Code	No	Publicly available datasets are used, but the code is not open source. (from NeurIPS Checklist Q5)
Open Datasets	Yes	Fashion MNIST We also include results on Fashion MNIST [24]. CIFAR-10 We provide results on the CIFAR-10 dataset [25]. Imagenet We conduct experiments on the popular Imagenet dataset [27].
Dataset Splits	No	The paper mentions using standard setups for CIFAR-10 and Imagenet which imply standard splits, but does not explicitly state the training, validation, and test dataset splits with percentages or sample counts.
Hardware Specification	Yes	Models are trained on 8 Nvidia Volta GPUs. and Models are trained on using on TPU V3 chips.
Software Dependencies	No	The paper mentions using JAX (Appendix C.2, reference [40]) but does not provide specific version numbers for JAX or any other software dependencies.
Experiment Setup	Yes	All experiments use the Wide Resnet 28-10 architecture with the same setup and hyperparameters as [26], except for the use of cosine learning rate decay. Batch size is 128. and All experiments use the Resnet-50 architecture with the same setup and hyperparameters as [28], except that we use cosine learning rate decay [29] over 350 epochs. Batch size is set to 1024.