reproducibilityindex.ai

Hessian-based Analysis of Large Batch Training and Robustness to Adversaries

Authors: Zhewei Yao, Amir Gholami, Qi Lei, Kurt Keutzer, Michael W. Mahoney

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on multiple networks show that saddle-points are not the cause for generalization gap of large batch size training, and the results consistently show that large batch converges to points with noticeably higher Hessian spectrum. We present detailed experiments with ﬁve different network architectures, including a residual network, tested on MNIST, CIFAR-10, and CIFAR-100 datasets.
Researcher Affiliation	Academia	Zhewei Yao1 Amir Gholami1 Qi Lei2 Kurt Keutzer1 Michael W. Mahoney1 1 University of California at Berkeley, {zheweiy, amirgh, keutzer and mahoneymw}@berkeley.edu 2 University of Texas at Austin, leiqi@ices.utexas.edu
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper mentions a follow-up paper ([33]) that designed a new algorithm, but does not provide any link or explicit statement about the source code for the methodology presented in this paper.
Open Datasets	Yes	We present detailed experiments with ﬁve different network architectures, including a residual network, tested on MNIST, CIFAR-10, and CIFAR-100 datasets.
Dataset Splits	Yes	We present detailed experiments with ﬁve different network architectures, including a residual network, tested on MNIST, CIFAR-10, and CIFAR-100 datasets. For the original training, we set the learning rate to 0.01 and momentum to 0.9, and decay the learning rate by half after every 5 epochs, for a total of 100 epochs.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running its experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	For the original training, we set the learning rate to 0.01 and momentum to 0.9, and decay the learning rate by half after every 5 epochs, for a total of 100 epochs. Then we perform an additional ﬁve epochs of adversarial training with a learning rate of 0.01. The perturbation magnitude, , is set to 0.1 for L1 attack and 2.8 for L2 attack. We also present results for C3 model [4] on CIFAR-10, using the same hyper-parameters, except that the training is performed for 100 epochs. Afterwards, adversarial training is performed for a subsequent 10 epochs with a learning rate of 0.01 and momentum of 0.9 (the learning rate is decayed by half after ﬁve epochs). Furthermore, the adversarial perturbation magnitude is set to = 0.02 for L1 attack and 1.2 for L2 attack[27].