reproducibilityindex.ai

On the Local Hessian in Back-propagation

Authors: Huishuai Zhang, Wei Chen, Tie-Yan Liu

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We apply it to train neural networks with batch normalization, and achieve favorable results over vanilla SGD. This corroborates the importance of local Hessian from another side.
Researcher Affiliation	Industry	Huishuai Zhang Microsoft Research Asia Beijing, 100080 Wei Chen Microsoft Research Asia Beijing, 100080 Tie-Yan Liu Microsoft Research Asia Beijing, 100080
Pseudocode	Yes	Procedure 1 Back-matching Propagation... Algorithm 2 Scale-amended SGD
Open Source Code	No	The paper does not provide any concrete statement or link regarding the availability of its source code.
Open Datasets	Yes	We next evaluate the scale-amended SGD on training VGG nets [Simonyan and Zisserman, 2015] for image classiﬁcation tasks with two datasets: CIFAR-10 [Krizhevsky and Hinton, 2009] and CIFAR-100 [Krizhevsky and Hinton, 2009].
Dataset Splits	Yes	We reduce the learning rate by half once the validation accuracy is on plateau (Reduce LROn Plateau in Py Torch with patience=10)
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory specifications) used for running experiments were mentioned.
Software Dependencies	No	The paper mentions 'Py Torch' but does not specify a version number.
Experiment Setup	Yes	The hyper-parameters for vanilla SGD and our scale-amended SGD are the same including learning rate = 0.1 (because the backward factor for linear layer of CIFAR10 is around 10 512, small learning rate = 0.005 works better for CIFAR10 to use scale-amened SGD), momentum 0.9 and weight decay coefﬁcient 0.005.