Zeroth-Order Negative Curvature Finding: Escaping Saddle Points without Gradients

Authors: Hualin Zhang, Huan Xiong, Bin Gu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Numerical Experiments Octopus Function. We first consider the octopus function proposed by Du et al. [12]. The octopus function has 2d local optimum: x = ( 4τ, . . . , 4τ)T and 2d 1 saddle points: (0, . . . , 0)T, ( 4τ, 0, . . . , 0)T, . . . , ( 4τ, . . . , 4τ, 0)T. We compare ZO-GD-NCF, ZPSGD, PAGD, and RSPI on the octopus function with growing dimensions. The parameters corresponding to the octopus function are set with τ = e, L = e, γ = 1. All algorithms are initialized at point (0, . . . , 0)T, which is a strict saddle point and the one farthest from the optimal points among the 2d 1 saddle points. We set ϵ = 1e 4, δ = ρϵ for all experiments and report the function value v.s. the number of function queries in Figure 1.
Researcher Affiliation Academia Hualin Zhang1 Huan Xiong2,3 Bin Gu1,3 1 Nanjing University of Information Science & Technology 2 Harbin Institute of Technology 3 Mohamed bin Zayed University of Artificial Intelligence
Pseudocode Yes Algorithm 1 ZO-NCF-Online-Weak (f, x0, δ) ... Algorithm 2 ZO-NCF-Online Input: f( ) = 1/n Pn i=1 fi( ), x0, δ > 0, p (0, 1]. ... Algorithm 3 ZO-NCF-Deterministic Input: Function f( ), point x0, negative curvature δ > 0, confidence p (0, 1]. ... Algorithm 4 ZO-SGD-NCF Input: Function f, starting point x0, confidence p (0, 1), ϵ > 0 and δ > 0. ... Algorithm 5 ZO-GD-NCF Input: Function f, starting point x0, confidence p (0, 1), ϵ > 0 and δ > 0.
Open Source Code No The paper states in its reproducibility checklist: 'Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes]', but it does not provide a specific URL or an explicit statement within the main body of the paper indicating where the code is available.
Open Datasets Yes We first consider the octopus function proposed by Du et al. [12].
Dataset Splits No The paper uses a synthetic benchmark function ("octopus function") for numerical experiments, which does not involve explicit training/validation/test dataset splits in the conventional sense. It specifies parameters for the function and initialization but not data partitioning.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies No The paper does not provide specific version numbers for software components or dependencies used in the experiments.
Experiment Setup Yes The parameters corresponding to the octopus function are set with τ = e, L = e, γ = 1. All algorithms are initialized at point (0, . . . , 0)T, which is a strict saddle point and the one farthest from the optimal points among the 2d 1 saddle points. We set ϵ = 1e 4, δ = ρϵ for all experiments. For RSPI, we follow the hyperparameter update strategy as described in ([30], Appendix, Section F): We keep σ2 constant and update σ1 = ρσ1σ1 every Tσ1 iterations. We conduct a grid search for Tσ1 and ρσ1.