Stopping Bayesian Optimization with Probabilistic Regret Bounds

Authors: James Wilson

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental These findings are accompanied by empirical results which demonstrate the strengths and weaknesses of the proposed approach. ... Finally, Section 5 investigates its empirical performance under idealized and realistic circumstances.
Researcher Affiliation Industry James T. Wilson Morgan Stanley, New York, USA james.t.wilson@morganstanley.com
Pseudocode Yes Algorithm 1 BO with Monte Carlo PRB... Algorithm 2 Monte Carlo PRB
Open Source Code Yes code is available online at https://github.com/j-wilson/trieste_stopping.
Open Datasets Yes Experiments were performed by first running BO with conservatively chosen budgets T N. We then stepped through each saved run with different stopping rules to establish stopping times and terminal performance. This paradigm ensured fair comparisons and reduced compute overheads. We performed a hundred independent BO runs for all problems other than hyperparameter tuning for convolutional neural networks (CNNs) on MNIST [14], where only fifty runs were carried out. ... income prediction [7]
Dataset Splits No The paper mentions generating "train-test splits" for certain problems but does not specify exact percentages or sample counts for training, validation, or test sets.
Hardware Specification Yes Runtimes reported in Figure 3 were measured on an Apple M1 Pro Chip using an off-the-shelf build of Tensor Flow [1].
Software Dependencies No The paper mentions software like GPFlow, Trieste, and TensorFlow, but does not specify their version numbers in the text. For example, in C.1: "We employed Gaussian process priors f GP(µ, k) ... using an off-the-shelf build of Tensor Flow [1]."
Experiment Setup Yes Each BO run was tasked with finding an ϵ-optimal point with probability at least 1 δ = 95%. On the Rosenbrock-4 fine-tuning problem, we used a regret bound ϵ = 10 4. For CNNs, we aimed to be within ϵ = 0.5% of the best test error (i.e., misclassification rate) seen across all runs, namely 0.62%. Likewise, when fitting XGBoost classifiers [12] for income prediction [7], we sought to be within 1% of the best found test error of 12.89%. For all other problems, we set ϵ = 0.1. ... Each model was trained using Adam [24], with batches of size 64.