On the Local Hessian in Back-propagation
Authors: Huishuai Zhang, Wei Chen, Tie-Yan Liu
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply it to train neural networks with batch normalization, and achieve favorable results over vanilla SGD. This corroborates the importance of local Hessian from another side. |
| Researcher Affiliation | Industry | Huishuai Zhang Microsoft Research Asia Beijing, 100080 Wei Chen Microsoft Research Asia Beijing, 100080 Tie-Yan Liu Microsoft Research Asia Beijing, 100080 |
| Pseudocode | Yes | Procedure 1 Back-matching Propagation... Algorithm 2 Scale-amended SGD |
| Open Source Code | No | The paper does not provide any concrete statement or link regarding the availability of its source code. |
| Open Datasets | Yes | We next evaluate the scale-amended SGD on training VGG nets [Simonyan and Zisserman, 2015] for image classification tasks with two datasets: CIFAR-10 [Krizhevsky and Hinton, 2009] and CIFAR-100 [Krizhevsky and Hinton, 2009]. |
| Dataset Splits | Yes | We reduce the learning rate by half once the validation accuracy is on plateau (Reduce LROn Plateau in Py Torch with patience=10) |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory specifications) used for running experiments were mentioned. |
| Software Dependencies | No | The paper mentions 'Py Torch' but does not specify a version number. |
| Experiment Setup | Yes | The hyper-parameters for vanilla SGD and our scale-amended SGD are the same including learning rate = 0.1 (because the backward factor for linear layer of CIFAR10 is around 10 512, small learning rate = 0.005 works better for CIFAR10 to use scale-amened SGD), momentum 0.9 and weight decay coefficient 0.005. |