Proximal Stochastic Methods for Nonsmooth Nonconvex Finite-Sum Optimization

Authors: Sashank J. Reddi, Suvrit Sra, Barnabas Poczos, Alexander J. Smola

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present our empirical results in this section. For our experiments, we study the problem of non-negative principal component analysis (NN-PCA). More specifically, for a given set of samples {zi}ni=1, we solve the following optimization problem: min kxk 1, x 0 1 # grad/n 0 5 10 15 f(x) ! f( x) SGD SAGA SVRG # grad/n 0 5 10 15 f(x) ! f( x) SGD SAGA SVRG # grad/n 0 5 10 15 f(x) ! f( x) SGD SAGA SVRG # grad/n 0 5 10 15 f(x) ! f( x) SGD SAGA SVRG Figure 2: Non-negative principal component analysis. Performance of PROXSGD, PROXSVRG and PROXSAGA on rcv1 (left), a9a (left-center), mnist (right-center) and aloi (right) datasets. Here, the y-axis is the function suboptimality i.e., f(x) f(ˆx) where ˆx represents the best solution obtained by running gradient descent for long time and with multiple restarts.
Researcher Affiliation Academia Sashank J. Reddi Carnegie Mellon University sjakkamr@cs.cmu.edu Suvrit Sra Massachusetts Institute of Technology suvrit@mit.edu Barnabás Póczos Carnegie Mellon University bapoczos@cs.cmu.edu Alexander J. Smola Carnegie Mellon University alex@smola.org
Pseudocode Yes Algorithm 1: Nonconvex PROXSVRG, Algorithm 2: Nonconvex PROXSAGA, Figure 1: PROXSVRG and PROXSAGA variants for PL functions.
Open Source Code No The paper does not provide any links to open-source code for the described methodology or state that code is released.
Open Datasets Yes We use standard machine learning datasets in LIBSVM for all our experiments 5. The datasets can be downloaded from https://www.csie.ntu.edu.tw/~cjlin/ libsvmtools/datasets.
Dataset Splits No The paper mentions using PROXSGD for initialization and then evaluating objective function values, but does not specify any explicit train/validation/test splits, percentages, or sample counts.
Hardware Specification No The paper does not provide any specific hardware details such as GPU/CPU models, processor types, or memory used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes The choice of step size is important to PROXSGD. The step size of PROXSGD is set using the popular t-inverse step size choice of t = 0(1 + 0bt/nc) 1 where 0, 0 > 0. For PROXSVRG and PROXSAGA, motivated by the theoretical analysis, we use a fixed step size. The parameters of the step size in each of these methods are chosen so that the method gives the best performance on the objective value. ... For PROXSVRG, we use the epoch length m = n. ... Each of these methods is initialized by running PROXSGD for n iterations. ... In our experiments, we use b = 1 in order to demonstrate the performance of the algorithms with constant minibatches.