Maximizing acquisition functions for Bayesian optimization
Authors: James Wilson, Frank Hutter, Marc Deisenroth
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We assessed the efficacy of gradient-based and submodular strategies for maximizing acquisition function in two primary settings: synthetic , where task f was drawn from a known GP prior, and black-box , where f s nature is unknown to the optimizer. |
| Researcher Affiliation | Academia | James T. Wilson Imperial College London Frank Hutter University of Freiburg Marc Peter Deisenroth Imperial College London |
| Pseudocode | Yes | Figure 1: (a) Pseudo-code for standard BO s outer-loop with parallelism q; the inner optimization problem is boxed in red. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described, nor does it explicitly state that the code is publicly available. |
| Open Datasets | No | The paper describes using 'synthetic tasks' and 'black-box tasks' (Levy, Hartmann-6) but does not provide specific access information (link, DOI, repository, or formal citation with authors/year) for publicly available datasets. |
| Dataset Splits | No | The paper mentions '32 independent trials' and starting with 'three randomly chosen inputs' but does not specify exact dataset split percentages, sample counts, or reference predefined splits needed to reproduce the data partitioning. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running experiments. It does not mention any hardware at all explicitly. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiment. |
| Experiment Setup | Yes | For ADAM, we used stochastic minibatches consisting of m = 128 samples and an initial learning rate = 1/40. To combat non-convexity, gradient ascent was run from a total of 32 (64) starting positions when greedily (jointly) maximizing L. |