A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization

Authors: Zhize Li, Jian Li

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we conduct several experiments and the experimental results are consistent with the theoretical results. In this section, we present the experimental results. We compare the nonconvex Prox SVRG+ with nonconvex Prox GD, Prox SGD [10], Prox SVRG [24]. We conduct the experiments using the nonnegative principal component analysis (NN-PCA) problem (same as [24]). The experimental results on both datasets (corresponding to the first row and second row in Figure 3 5) are almost the same.
Researcher Affiliation Academia Zhize Li IIIS, Tsinghua University zz-li14@mails.tsinghua.edu.cn Jian Li IIIS, Tsinghua University lijian83@mail.tsinghua.edu.cn
Pseudocode Yes Algorithm 1 Nonconvex Prox SVRG+
Open Source Code No The paper does not contain any explicit statements about releasing source code for the described methodology or a direct link to a code repository.
Open Datasets Yes We conduct the experiment on the standard MNIST and a9a datasets. The datasets can be downloaded from https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/
Dataset Splits No The paper mentions using the 'standard MNIST and a9a datasets' but does not provide specific details on how these datasets were split into training, validation, and testing sets, nor does it refer to predefined splits with citations.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or cloud instance types) used for running its experiments.
Software Dependencies No The paper does not provide specific software dependencies or versions (e.g., library names with version numbers like PyTorch 1.9, TensorFlow 2.x) needed to replicate the experiment.
Experiment Setup Yes The step sizes η for different algorithms are set to be the ones used in their convergence results: For Prox GD, it is η = 1/L (see Corollary 1 in [10]); for Prox SGD, η = 1/(2L) (see Corollary 3 in [10]); for Prox SVRG, η = b3/2/(3Ln) (see Theorem 6 in [24]). The step size for our Prox SVRG+ is 1/(6L) (see our Theorem 1). The batch size B (in Line 4 of Algorithm 1) is equal to n/5 (i.e., 20% data samples).