Zeroth-Order Negative Curvature Finding: Escaping Saddle Points without Gradients
Authors: Hualin Zhang, Huan Xiong, Bin Gu
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Numerical Experiments Octopus Function. We first consider the octopus function proposed by Du et al. [12]. The octopus function has 2d local optimum: x = ( 4τ, . . . , 4τ)T and 2d 1 saddle points: (0, . . . , 0)T, ( 4τ, 0, . . . , 0)T, . . . , ( 4τ, . . . , 4τ, 0)T. We compare ZO-GD-NCF, ZPSGD, PAGD, and RSPI on the octopus function with growing dimensions. The parameters corresponding to the octopus function are set with τ = e, L = e, γ = 1. All algorithms are initialized at point (0, . . . , 0)T, which is a strict saddle point and the one farthest from the optimal points among the 2d 1 saddle points. We set ϵ = 1e 4, δ = ρϵ for all experiments and report the function value v.s. the number of function queries in Figure 1. |
| Researcher Affiliation | Academia | Hualin Zhang1 Huan Xiong2,3 Bin Gu1,3 1 Nanjing University of Information Science & Technology 2 Harbin Institute of Technology 3 Mohamed bin Zayed University of Artificial Intelligence |
| Pseudocode | Yes | Algorithm 1 ZO-NCF-Online-Weak (f, x0, δ) ... Algorithm 2 ZO-NCF-Online Input: f( ) = 1/n Pn i=1 fi( ), x0, δ > 0, p (0, 1]. ... Algorithm 3 ZO-NCF-Deterministic Input: Function f( ), point x0, negative curvature δ > 0, confidence p (0, 1]. ... Algorithm 4 ZO-SGD-NCF Input: Function f, starting point x0, confidence p (0, 1), ϵ > 0 and δ > 0. ... Algorithm 5 ZO-GD-NCF Input: Function f, starting point x0, confidence p (0, 1), ϵ > 0 and δ > 0. |
| Open Source Code | No | The paper states in its reproducibility checklist: 'Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes]', but it does not provide a specific URL or an explicit statement within the main body of the paper indicating where the code is available. |
| Open Datasets | Yes | We first consider the octopus function proposed by Du et al. [12]. |
| Dataset Splits | No | The paper uses a synthetic benchmark function ("octopus function") for numerical experiments, which does not involve explicit training/validation/test dataset splits in the conventional sense. It specifies parameters for the function and initialization but not data partitioning. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for software components or dependencies used in the experiments. |
| Experiment Setup | Yes | The parameters corresponding to the octopus function are set with τ = e, L = e, γ = 1. All algorithms are initialized at point (0, . . . , 0)T, which is a strict saddle point and the one farthest from the optimal points among the 2d 1 saddle points. We set ϵ = 1e 4, δ = ρϵ for all experiments. For RSPI, we follow the hyperparameter update strategy as described in ([30], Appendix, Section F): We keep σ2 constant and update σ1 = ρσ1σ1 every Tσ1 iterations. We conduct a grid search for Tσ1 and ρσ1. |