Unexpected Improvements to Expected Improvement for Bayesian Optimization
Authors: Sebastian Ament, Samuel Daulton, David Eriksson, Maximilian Balandat, Eytan Bakshy
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical results show that members of the Log EI family of acquisition functions substantially improve on the optimization performance of their canonical counterparts and surprisingly, are on par with or exceed the performance of recent state-of-the-art acquisition functions, highlighting the understated role of numerical optimization in the literature. |
| Researcher Affiliation | Industry | Sebastian Ament Meta ament@meta.com Samuel Daulton Meta sdaulton@meta.com David Eriksson Meta deriksson@meta.com Maximilian Balandat Meta balandat@meta.com Eytan Bakshy Meta ebakshy@meta.com |
| Pseudocode | No | The paper includes mathematical formulations and descriptions but no explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | All of our methods are available as part of Bo Torch [6]. |
| Open Datasets | Yes | In Figure 3, we compare performance on the Ackley and Michalewicz test functions [67]. Figure 4 shows results on four engineering design problems with black box constraints that were also considered in [22]. Figure 6 shows the performance of Log EI on three highdimensional problems: the 6-dimensional Hartmann function embedded in a 100-dimensional space, a 100-dimensional rover trajectory planning problem, and a 103-dimensional SVM hyperparameter tuning problem. For the laser plasma acceleration problem, we used the public data available at Irshad et al. [39] to fit an independent GP surrogate model to each objective. We only queried te surrogate at the highest fidelity to create a single fidelity benchmark. [39] Faran Irshad, Stefan Karsch, and Andreas Doepp. Reference dataset of multi-objective and multifidelity optimization in laser-plasma acceleration, January 2023. URL https://doi. org/10.5281/zenodo.7565882. |
| Dataset Splits | No | The data generating process (DGP) for the training data used for the left plot of Figure 1 is the following: 80% of training points are sampled uniformly at random from the domain, while 20% are sampled according to a multivariate Gaussian centered at the function maximum with a standard deviation of 25% of the length of the domain. This describes the generation of data points for illustrative figures, not explicit train/validation/test splits for the main experiments. |
| Hardware Specification | Yes | Multi-objectve acquisition function optimization wall time in seconds on CPU (2x Intel Xeon E5-2680 v4 @ 2.40GHz) . |
| Software Dependencies | No | All experiments are implemented using Bo Torch [6] and utilize multi-start optimization of the AF with scipy s L-BFGS-B optimizer. While specific software packages are mentioned, their version numbers are not provided. |
| Experiment Setup | Yes | All experiments are implemented using Bo Torch [6] and utilize multi-start optimization of the AF with scipy s L-BFGS-B optimizer. In order to avoid conflating the effect of Bo Torch s default initialization strategy with those of our contributions, we use 16 initial points chosen uniformly at random from which to start the L-BFGS-B optimization. We use a Matern-5/2 kernel with automatic relevance determination (ARD), i.e. separate length-scales for each input dimension, and a top-hat prior on the length-scales in [0.01, 100]. The input spaces are normalized to the unit hyper-cube and the objective values are standardized during each optimization iteration. |