Faster Gradient-Free Proximal Stochastic Methods for Nonconvex Nonsmooth Optimization

Authors: Feihu Huang, Bin Gu, Zhouyuan Huo, Songcan Chen, Heng Huang1503-1510

AAAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, the experimental results verify that our algorithms have a faster convergence rate than the existing zerothorder proximal stochastic algorithm.
Researcher Affiliation Collaboration College of Computer Science & Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, 211106, China 2Department of Electrical & Computer Engineering, University of Pittsburgh, PA 15261, USA 3JDDGlobal.com
Pseudocode Yes Algorithm 1 ZO-Prox SVRG for Nonconvex Optimization and Algorithm 2 ZO-Prox SAGA for Nonconvex Optimization
Open Source Code No The paper does not provide an explicit statement about the release of source code for the described methodology or a link to a code repository.
Open Datasets Yes In the experiment, we use the publicly available real datasets1, which are summarized in Table 2. Footnote 1: 20news is from the website https://cs.nyu.edu/ roweis/data. html; a9a, w8a and covtype.binary are from the website www.csie. ntu.edu.tw/ cjlin/libsvmtools/datasets/.
Dataset Splits Yes For each dataset, we use half of the samples as training data, and the rest as testing data.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, or memory specifications used for running experiments.
Software Dependencies No The paper does not provide specific software dependency details with version numbers (e.g., programming languages, libraries, or frameworks).
Experiment Setup Yes In the algorithms, we fix the mini-batch size b = 20, the smoothing parameters µ = 1/dt in the Gau SGE and µ = 1/d√t in the Goo SGE. Meanwhile, we fix λ1 = λ2 = 10−5, and use the same initial solution x0 from the standard normal distribution in each experiment. In the experiment, we select n = 10 examples from the same class, and set the batch size b = 5 and a constant step size η = 1/d for the zeroth-order algorithms, where d = 28 * 28. In addition, we set λ1 = 10−3 and λ2 = 1 in the experiment.