General Stability Analysis for Zeroth-Order Optimization Algorithms

Authors: Xinyue Liu, Hualin Zhang, Bin Gu, Hong Chen

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental NUMERICAL EXPERIMENTS In this section, we assess the generalization errors associated with optimizing nonconvex loss functions using ZO-GD, ZO-SGD, and ZO-SVRG. To primary goal is to verify the generalization errors of different zeroth-order optimization algorithms and different gradient estimators. To achieve this, we conduct experiments on two nonconvex models: nonconvex logistic regression and a two layer neural network.
Researcher Affiliation Academia 1College of Informatics, Huazhong Agricultural University, China 2Engineering Research Center of Intelligent Technology for Agriculture, China 3School of Artificial Intelligence, Jilin University, China 4Mohamed bin Zayed University of Artificial Intelligence, UAE
Pseudocode No No structured pseudocode or algorithm blocks were found.
Open Source Code No The paper does not provide an explicit statement about concrete access to source code for the methodology described.
Open Datasets Yes For both two nonconvex models, we utilize the LIBSVM s Australian dataset.
Dataset Splits Yes We separate the dataset into two parts: 80% for training and 20% for test.
Hardware Specification No No specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running the experiments are provided.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers.
Experiment Setup Yes For all experiments, we set the maximum number of iterations to be 2000. The batch size of the stochastic gradient is set to be 50. The initial learning rate is set to 0.01. This rate is systematically decreased every T iterations by a factor of γ. Both T and γ are optimally determined through a grid search process where T and γ are chosen from {30, 60, 100, 150, 200, 250} and {0.6, 0.7, 0.8, 0.9}, respectively. For 2-point gradient estimator, we also conduct a grid search for the parameter K chosen from the set {2, 3, 4, 6, 8, 9, 12}.