Scalable First-Order Bayesian Optimization via Structured Automatic Differentiation

Authors: Sebastian E Ament, Carla P Gomes

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4. Experiments 4.1. Scaling on Synthetic Data 4.2. Comparison to Prior Work 4.3. Bayesian Optimization Figure 2: Benchmarks of matrix-vector-multiplications with the gradient (top) and Hessian kernel matrices (bottom) using a rational quadratic kernel.
Researcher Affiliation Academia 1Department of Computer Science, Cornell University, Ithaca, NY, 14850, USA. Correspondence to: Sebastian Ament <ament@cs.cornell.edu>.
Pseudocode Yes Algorithm 1 Bayesian Optimization with Restarts
Open Source Code Yes and make our implementation publicly available1. 1github.com/Sebastian Ament/Covariance Functions.jl
Open Datasets Yes We benchmark both Bayesian and canonical optimization algorithms with and without gradient information on some of the test functions given by Bingham and Surjanovic (2013), namely, the Griewank, Ackley, and Rastrigin functions. See Section F for the definitions of the test functions.
Dataset Splits No The paper does not specify a validation set or explicit training/validation/test split percentages for the datasets used in the experiments. It refers to 'test functions' and 'independent experiments' but not specific data splits for reproduction beyond the problem definition.
Hardware Specification No The paper mentions that experiments were run 'with 24 threads on 12 cores in parallel' but does not provide specific hardware details such as CPU or GPU models, or memory specifications.
Software Dependencies No The paper mentions 'Julia (Bezanson et al., 2017)' and 'Forward Diff.jl (Revels et al., 2016)' as software used, but does not provide specific version numbers for these or any other dependencies.
Experiment Setup Yes For all functions, we scaled the input domains to lie in [ 1, 1]d, scaled the output to lie in [0, 1], and shifted the global optimum of all functions to 1d/4. All the BO variants use the expected improvement acquisition function which is numerically optimized w.r.t. the next observation point using L-BFGS. If the proposed next observation lies within 10 4 of any previously observed point, we choose a random point instead (see Algorithm 1)