A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization
Authors: Zhize Li, Jian Li
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we conduct several experiments and the experimental results are consistent with the theoretical results. In this section, we present the experimental results. We compare the nonconvex Prox SVRG+ with nonconvex Prox GD, Prox SGD [10], Prox SVRG [24]. We conduct the experiments using the nonnegative principal component analysis (NN-PCA) problem (same as [24]). The experimental results on both datasets (corresponding to the first row and second row in Figure 3 5) are almost the same. |
| Researcher Affiliation | Academia | Zhize Li IIIS, Tsinghua University zz-li14@mails.tsinghua.edu.cn Jian Li IIIS, Tsinghua University lijian83@mail.tsinghua.edu.cn |
| Pseudocode | Yes | Algorithm 1 Nonconvex Prox SVRG+ |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code for the described methodology or a direct link to a code repository. |
| Open Datasets | Yes | We conduct the experiment on the standard MNIST and a9a datasets. The datasets can be downloaded from https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/ |
| Dataset Splits | No | The paper mentions using the 'standard MNIST and a9a datasets' but does not provide specific details on how these datasets were split into training, validation, and testing sets, nor does it refer to predefined splits with citations. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or cloud instance types) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies or versions (e.g., library names with version numbers like PyTorch 1.9, TensorFlow 2.x) needed to replicate the experiment. |
| Experiment Setup | Yes | The step sizes η for different algorithms are set to be the ones used in their convergence results: For Prox GD, it is η = 1/L (see Corollary 1 in [10]); for Prox SGD, η = 1/(2L) (see Corollary 3 in [10]); for Prox SVRG, η = b3/2/(3Ln) (see Theorem 6 in [24]). The step size for our Prox SVRG+ is 1/(6L) (see our Theorem 1). The batch size B (in Line 4 of Algorithm 1) is equal to n/5 (i.e., 20% data samples). |