reproducibilityindex.ai

Better SGD using Second-order Momentum

Authors: Hoang Tran, Ashok Cutkosky

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our algorithm not only enjoys optimal theoretical properties, it is also practically effective, as demonstrated through our experimental results across various deep learning tasks.
Researcher Affiliation	Academia	Hoang Tran Boston University tranhp@bu.edu Ashok Cutkosky Boston University ashok@cutkosky.com
Pseudocode	Yes	Algorithm 1 SGD with Hessian-corrected Momentum (SGDHess)
Open Source Code	Yes	The link to the code is provided in the appendix.
Open Datasets	Yes	Our Cifar10 experiment is conducted using the official implementation of Ada Hessian. ... We also train SGD, SGDHess, and Ada Hessian with Imagenet Deng et al. [2009] on Resnet18... We use the IWSLT 14 German to English dataset that contains 153k/7k/7k in the train/validation/test set.
Dataset Splits	Yes	We use the IWSLT 14 German to English dataset that contains 153k/7k/7k in the train/validation/test set.
Hardware Specification	Yes	All experiments are run on NVIDIA v100 GPUs.
Software Dependencies	No	The paper mentions 'Pytorch' but does not specify its version or any other software dependencies with version numbers.
Experiment Setup	Yes	For the rest of the optimizers, we performed a grid search on the base learning rate η {0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1} to find the best settings. Similar to the Cifar10 experiment of Ada Hessian, we also trained our models on 160 epochs and we ran each optimizer 5 times and reported the average best accuracy as well as the standard deviation (detailed results in the appendix). We use standard parameter values for SGD (lr = 0.1, momentum = 0.9, weight_decay = 1e-4) for both SGD and SGDHess and the recommended parameters values for Ada Hessian. For the learning rate scheduler, we employ the plateau decay scheduler that was used in Yao et al. [2020]. We train our model in 90 epochs as usual.