Non-convex Finite-Sum Optimization Via SCSG Methods

Authors: Lihua Lei, Cheng Ju, Jianbo Chen, Michael I. Jordan

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical experiments demonstrate that SCSG outperforms stochastic gradient methods on training multi-layers neural networks in terms of both training and validation loss.
Researcher Affiliation Academia Lihua Lei UC Berkeley lihua.lei@berkeley.edu Cheng Ju UC Berkeley cju@berkeley.edu Jianbo Chen UC Berkeley jianbochen@berkeley.edu Michael I. Jordan UC Berkeley jordan@stat.berkeley.edu
Pseudocode Yes Algorithm 1 (Mini-Batch) Stochastically Controlled Stochastic Gradient (SCSG) method for smooth non-convex finite-sum objectives
Open Source Code Yes Our code is available at https://github.com/Jianbo-Lab/SCSG.
Open Datasets Yes We evaluate SCSG and mini-batch SGD on the MNIST dataset
Dataset Splits No The paper mentions "training and validation loss" in Figure 1, implying the use of a validation set. However, it only explicitly states the size for the training (50,000) and test (10,000) examples, and does not provide specific details on how the validation set was created or its size.
Hardware Specification Yes All experiments were carried out on an Amazon p2.xlarge node with a NVIDIA GK210 GPU
Software Dependencies Yes algorithms implemented in Tensor Flow 1.0.
Experiment Setup Yes We initialized parameters by Tensor Flow s default Xavier uniform initializer. In all experiments below, we show the results corresponding to the best-tuned stepsizes. We consider three algorithms: (1) SGD with a fixed batch size B {512, 1024}; (2) SCSG with a fixed batch size B {512, 1024} and a fixed mini-batch size b = 32; (3) SCSG with time-varying batch sizes Bj = j3/2 n and bj = Bj/32 . To be clear, given T epochs, the IFO complexity of the three algorithms are TB, 2TB and 2 PT j=1 Bj, respectively. We run each algorithm with 20 passes of data.