reproducibilityindex.ai

Non-convex Finite-Sum Optimization Via SCSG Methods

Authors: Lihua Lei, Cheng Ju, Jianbo Chen, Michael I. Jordan

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical experiments demonstrate that SCSG outperforms stochastic gradient methods on training multi-layers neural networks in terms of both training and validation loss.
Researcher Affiliation	Academia	Lihua Lei UC Berkeley lihua.lei@berkeley.edu Cheng Ju UC Berkeley cju@berkeley.edu Jianbo Chen UC Berkeley jianbochen@berkeley.edu Michael I. Jordan UC Berkeley jordan@stat.berkeley.edu
Pseudocode	Yes	Algorithm 1 (Mini-Batch) Stochastically Controlled Stochastic Gradient (SCSG) method for smooth non-convex ﬁnite-sum objectives
Open Source Code	Yes	Our code is available at https://github.com/Jianbo-Lab/SCSG.
Open Datasets	Yes	We evaluate SCSG and mini-batch SGD on the MNIST dataset
Dataset Splits	No	The paper mentions "training and validation loss" in Figure 1, implying the use of a validation set. However, it only explicitly states the size for the training (50,000) and test (10,000) examples, and does not provide specific details on how the validation set was created or its size.
Hardware Specification	Yes	All experiments were carried out on an Amazon p2.xlarge node with a NVIDIA GK210 GPU
Software Dependencies	Yes	algorithms implemented in Tensor Flow 1.0.
Experiment Setup	Yes	We initialized parameters by Tensor Flow s default Xavier uniform initializer. In all experiments below, we show the results corresponding to the best-tuned stepsizes. We consider three algorithms: (1) SGD with a ﬁxed batch size B {512, 1024}; (2) SCSG with a ﬁxed batch size B {512, 1024} and a ﬁxed mini-batch size b = 32; (3) SCSG with time-varying batch sizes Bj = j3/2 n and bj = Bj/32 . To be clear, given T epochs, the IFO complexity of the three algorithms are TB, 2TB and 2 PT j=1 Bj, respectively. We run each algorithm with 20 passes of data.