Gradientless Descent: High-Dimensional Zeroth-Order Optimization
Authors: Daniel Golovin, John Karro, Greg Kochanski, Chansoo Lee, Xingyou Song, Qiuyi Zhang
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We tested GLD algorithms on a simple class of objective functions and compare it to Accelerated Random Search (ARS) by Nesterov & Spokoiny (2011), which has linear convergence guarantees on strongly convex and strongly smooth functions. |
| Researcher Affiliation | Industry | Daniel Golovin, John Karro, Greg Kochanski, Chansoo Lee Xingyou Song, Qiuyi (Richard) Zhang Google Brain {dgg,karro,gpk,chansoo,xingyousong,qiuyiz}@google.com |
| Pseudocode | Yes | Algorithm 1: Gradientless Descent with Binary Search (GLD-Search) Algorithm 2: Gradientless Descent with Fast Binary Search (GLD-Fast) |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | We tested GLD algorithms on a simple class of objective functions and compare it to Accelerated Random Search (ARS) by Nesterov & Spokoiny (2011), which has linear convergence guarantees on strongly convex and strongly smooth functions. To show that practicality of GLD on practical and non-convex settings, we also test GLD algorithms on a variety of Black Box Optimization Benchmarking (BBOB) functions (Hansen et al., 2009). We also ran experiments on the Mujoco benchmarks with varying architectures, both linear and nonlinear. |
| Dataset Splits | No | The paper describes testing on objective functions and benchmarks but does not specify training, validation, and test splits for datasets in the conventional sense, as it focuses on optimization algorithms on functions/environments rather than data-driven models with distinct dataset splits. |
| Hardware Specification | No | The paper does not explicitly specify the hardware used to run the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies used in the experiments. |
| Experiment Setup | Yes | Our objective function is then fα,β,n : Rn R as fα,β,n(x) = 1 2x Hα,β,nx, which is α-strongly convex and β-strongly smooth. We always use the same starting point x = 1 n(1, . . . , 1), which requires X = Q for our algorithms. For each function, the optima is known and we use the log optimality gap as a measure of competance. All other setup details are same as before, such as using a fixed starting point. Because each function can exhibit varying forms of non-smoothness and convexity, all algorithms are ran with a smoothness constant of 10 and a strong convexity constant of 0.1. We used a horizon of 1000 for all experiments. |