Faster Gradient-Free Proximal Stochastic Methods for Nonconvex Nonsmooth Optimization
Authors: Feihu Huang, Bin Gu, Zhouyuan Huo, Songcan Chen, Heng Huang1503-1510
AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, the experimental results verify that our algorithms have a faster convergence rate than the existing zerothorder proximal stochastic algorithm. |
| Researcher Affiliation | Collaboration | College of Computer Science & Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China 2Department of Electrical & Computer Engineering, University of Pittsburgh, PA 15261, USA 3JDDGlobal.com |
| Pseudocode | Yes | Algorithm 1 ZO-Prox SVRG for Nonconvex Optimization and Algorithm 2 ZO-Prox SAGA for Nonconvex Optimization |
| Open Source Code | No | The paper does not provide an explicit statement about the release of source code for the described methodology or a link to a code repository. |
| Open Datasets | Yes | In the experiment, we use the publicly available real datasets1, which are summarized in Table 2. Footnote 1: 20news is from the website https://cs.nyu.edu/ roweis/data. html; a9a, w8a and covtype.binary are from the website www.csie. ntu.edu.tw/ cjlin/libsvmtools/datasets/. |
| Dataset Splits | Yes | For each dataset, we use half of the samples as training data, and the rest as testing data. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, or memory specifications used for running experiments. |
| Software Dependencies | No | The paper does not provide specific software dependency details with version numbers (e.g., programming languages, libraries, or frameworks). |
| Experiment Setup | Yes | In the algorithms, we fix the mini-batch size b = 20, the smoothing parameters µ = 1/dt in the Gau SGE and µ = 1/d√t in the Goo SGE. Meanwhile, we fix λ1 = λ2 = 10−5, and use the same initial solution x0 from the standard normal distribution in each experiment. In the experiment, we select n = 10 examples from the same class, and set the batch size b = 5 and a constant step size η = 1/d for the zeroth-order algorithms, where d = 28 * 28. In addition, we set λ1 = 10−3 and λ2 = 1 in the experiment. |