On the Local Hessian in Back-propagation

Authors: Huishuai Zhang, Wei Chen, Tie-Yan Liu

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply it to train neural networks with batch normalization, and achieve favorable results over vanilla SGD. This corroborates the importance of local Hessian from another side.
Researcher Affiliation Industry Huishuai Zhang Microsoft Research Asia Beijing, 100080 Wei Chen Microsoft Research Asia Beijing, 100080 Tie-Yan Liu Microsoft Research Asia Beijing, 100080
Pseudocode Yes Procedure 1 Back-matching Propagation... Algorithm 2 Scale-amended SGD
Open Source Code No The paper does not provide any concrete statement or link regarding the availability of its source code.
Open Datasets Yes We next evaluate the scale-amended SGD on training VGG nets [Simonyan and Zisserman, 2015] for image classification tasks with two datasets: CIFAR-10 [Krizhevsky and Hinton, 2009] and CIFAR-100 [Krizhevsky and Hinton, 2009].
Dataset Splits Yes We reduce the learning rate by half once the validation accuracy is on plateau (Reduce LROn Plateau in Py Torch with patience=10)
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory specifications) used for running experiments were mentioned.
Software Dependencies No The paper mentions 'Py Torch' but does not specify a version number.
Experiment Setup Yes The hyper-parameters for vanilla SGD and our scale-amended SGD are the same including learning rate = 0.1 (because the backward factor for linear layer of CIFAR10 is around 10 512, small learning rate = 0.005 works better for CIFAR10 to use scale-amened SGD), momentum 0.9 and weight decay coefficient 0.005.