reproducibilityindex.ai

On the insufficiency of existing momentum schemes for Stochastic Optimization

Authors: Rahul Kidambi, Praneeth Netrapalli, Prateek Jain, Sham M. Kakade

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive empirical results in this paper show that ASGD has performance gains over HB, NAG, and SGD.
Researcher Affiliation	Collaboration	Rahul Kidambi 1, Praneeth Netrapalli2, Prateek Jain2 and Sham M. Kakade1 1 University of Washington Seattle 2 Microsoft Research India
Pseudocode	Yes	Algorithm 1 HB: Heavy ball with a SFO
Open Source Code	Yes	The code implementing the ASGD Algorithm can be found here1. 1link to the ASGD code: https://github.com/rahulkidambi/Acc SGD
Open Datasets	Yes	training deep autoencoders for the mnist dataset
Dataset Splits	Yes	We use a validation set based decay scheme, wherein, after every 3 epochs, we decay the learning rate by a certain factor (which we grid search on) if the validation zero one error does not decrease by at least a certain amount (precise numbers are provided in the appendix since they vary across batch sizes).
Hardware Specification	No	The paper mentions using Matlab and PyTorch for experiments but does not provide specific hardware details such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	We use Matlab to conduct experiments presented in Section 5.1 and use Py Torch (pytorch, 2017) for our deep networks related experiments.
Experiment Setup	Yes	The network architecture follows previous work (Hinton & Salakhutdinov, 2006) and is represented as 784 1000 500 250 30 250 500 1000 784 with the ﬁrst and last 784 nodes representing the input and output respectively. All hidden/output nodes employ sigmoid activations except for the layer with 30 nodes which employs linear activations and we use MSE loss. Initialization follows the scheme of Martens (2010), also employed in Sutskever et al. (2013); Martens & Grosse (2015). We perform training with two minibatch sizes 1 and 8. ... We use a validation set based decay scheme, wherein, after every 3 epochs, we decay the learning rate by a certain factor...