Scalable First-Order Bayesian Optimization via Structured Automatic Differentiation
Authors: Sebastian E Ament, Carla P Gomes
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4. Experiments 4.1. Scaling on Synthetic Data 4.2. Comparison to Prior Work 4.3. Bayesian Optimization Figure 2: Benchmarks of matrix-vector-multiplications with the gradient (top) and Hessian kernel matrices (bottom) using a rational quadratic kernel. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Cornell University, Ithaca, NY, 14850, USA. Correspondence to: Sebastian Ament <ament@cs.cornell.edu>. |
| Pseudocode | Yes | Algorithm 1 Bayesian Optimization with Restarts |
| Open Source Code | Yes | and make our implementation publicly available1. 1github.com/Sebastian Ament/Covariance Functions.jl |
| Open Datasets | Yes | We benchmark both Bayesian and canonical optimization algorithms with and without gradient information on some of the test functions given by Bingham and Surjanovic (2013), namely, the Griewank, Ackley, and Rastrigin functions. See Section F for the deļ¬nitions of the test functions. |
| Dataset Splits | No | The paper does not specify a validation set or explicit training/validation/test split percentages for the datasets used in the experiments. It refers to 'test functions' and 'independent experiments' but not specific data splits for reproduction beyond the problem definition. |
| Hardware Specification | No | The paper mentions that experiments were run 'with 24 threads on 12 cores in parallel' but does not provide specific hardware details such as CPU or GPU models, or memory specifications. |
| Software Dependencies | No | The paper mentions 'Julia (Bezanson et al., 2017)' and 'Forward Diff.jl (Revels et al., 2016)' as software used, but does not provide specific version numbers for these or any other dependencies. |
| Experiment Setup | Yes | For all functions, we scaled the input domains to lie in [ 1, 1]d, scaled the output to lie in [0, 1], and shifted the global optimum of all functions to 1d/4. All the BO variants use the expected improvement acquisition function which is numerically optimized w.r.t. the next observation point using L-BFGS. If the proposed next observation lies within 10 4 of any previously observed point, we choose a random point instead (see Algorithm 1) |