Proximal Stochastic Methods for Nonsmooth Nonconvex Finite-Sum Optimization
Authors: Sashank J. Reddi, Suvrit Sra, Barnabas Poczos, Alexander J. Smola
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present our empirical results in this section. For our experiments, we study the problem of non-negative principal component analysis (NN-PCA). More specifically, for a given set of samples {zi}ni=1, we solve the following optimization problem: min kxk 1, x 0 1 # grad/n 0 5 10 15 f(x) ! f( x) SGD SAGA SVRG # grad/n 0 5 10 15 f(x) ! f( x) SGD SAGA SVRG # grad/n 0 5 10 15 f(x) ! f( x) SGD SAGA SVRG # grad/n 0 5 10 15 f(x) ! f( x) SGD SAGA SVRG Figure 2: Non-negative principal component analysis. Performance of PROXSGD, PROXSVRG and PROXSAGA on rcv1 (left), a9a (left-center), mnist (right-center) and aloi (right) datasets. Here, the y-axis is the function suboptimality i.e., f(x) f(ˆx) where ˆx represents the best solution obtained by running gradient descent for long time and with multiple restarts. |
| Researcher Affiliation | Academia | Sashank J. Reddi Carnegie Mellon University sjakkamr@cs.cmu.edu Suvrit Sra Massachusetts Institute of Technology suvrit@mit.edu Barnabás Póczos Carnegie Mellon University bapoczos@cs.cmu.edu Alexander J. Smola Carnegie Mellon University alex@smola.org |
| Pseudocode | Yes | Algorithm 1: Nonconvex PROXSVRG, Algorithm 2: Nonconvex PROXSAGA, Figure 1: PROXSVRG and PROXSAGA variants for PL functions. |
| Open Source Code | No | The paper does not provide any links to open-source code for the described methodology or state that code is released. |
| Open Datasets | Yes | We use standard machine learning datasets in LIBSVM for all our experiments 5. The datasets can be downloaded from https://www.csie.ntu.edu.tw/~cjlin/ libsvmtools/datasets. |
| Dataset Splits | No | The paper mentions using PROXSGD for initialization and then evaluating objective function values, but does not specify any explicit train/validation/test splits, percentages, or sample counts. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models, processor types, or memory used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | The choice of step size is important to PROXSGD. The step size of PROXSGD is set using the popular t-inverse step size choice of t = 0(1 + 0bt/nc) 1 where 0, 0 > 0. For PROXSVRG and PROXSAGA, motivated by the theoretical analysis, we use a fixed step size. The parameters of the step size in each of these methods are chosen so that the method gives the best performance on the objective value. ... For PROXSVRG, we use the epoch length m = n. ... Each of these methods is initialized by running PROXSGD for n iterations. ... In our experiments, we use b = 1 in order to demonstrate the performance of the algorithms with constant minibatches. |