General Stability Analysis for Zeroth-Order Optimization Algorithms
Authors: Xinyue Liu, Hualin Zhang, Bin Gu, Hong Chen
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | NUMERICAL EXPERIMENTS In this section, we assess the generalization errors associated with optimizing nonconvex loss functions using ZO-GD, ZO-SGD, and ZO-SVRG. To primary goal is to verify the generalization errors of different zeroth-order optimization algorithms and different gradient estimators. To achieve this, we conduct experiments on two nonconvex models: nonconvex logistic regression and a two layer neural network. |
| Researcher Affiliation | Academia | 1College of Informatics, Huazhong Agricultural University, China 2Engineering Research Center of Intelligent Technology for Agriculture, China 3School of Artificial Intelligence, Jilin University, China 4Mohamed bin Zayed University of Artificial Intelligence, UAE |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found. |
| Open Source Code | No | The paper does not provide an explicit statement about concrete access to source code for the methodology described. |
| Open Datasets | Yes | For both two nonconvex models, we utilize the LIBSVM s Australian dataset. |
| Dataset Splits | Yes | We separate the dataset into two parts: 80% for training and 20% for test. |
| Hardware Specification | No | No specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running the experiments are provided. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers. |
| Experiment Setup | Yes | For all experiments, we set the maximum number of iterations to be 2000. The batch size of the stochastic gradient is set to be 50. The initial learning rate is set to 0.01. This rate is systematically decreased every T iterations by a factor of γ. Both T and γ are optimally determined through a grid search process where T and γ are chosen from {30, 60, 100, 150, 200, 250} and {0.6, 0.7, 0.8, 0.9}, respectively. For 2-point gradient estimator, we also conduct a grid search for the parameter K chosen from the set {2, 3, 4, 6, 8, 9, 12}. |