reproducibilityindex.ai

Stopping Bayesian Optimization with Probabilistic Regret Bounds

Authors: James Wilson

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	These findings are accompanied by empirical results which demonstrate the strengths and weaknesses of the proposed approach. ... Finally, Section 5 investigates its empirical performance under idealized and realistic circumstances.
Researcher Affiliation	Industry	James T. Wilson Morgan Stanley, New York, USA james.t.wilson@morganstanley.com
Pseudocode	Yes	Algorithm 1 BO with Monte Carlo PRB... Algorithm 2 Monte Carlo PRB
Open Source Code	Yes	code is available online at https://github.com/j-wilson/trieste_stopping.
Open Datasets	Yes	Experiments were performed by first running BO with conservatively chosen budgets T N. We then stepped through each saved run with different stopping rules to establish stopping times and terminal performance. This paradigm ensured fair comparisons and reduced compute overheads. We performed a hundred independent BO runs for all problems other than hyperparameter tuning for convolutional neural networks (CNNs) on MNIST [14], where only fifty runs were carried out. ... income prediction [7]
Dataset Splits	No	The paper mentions generating "train-test splits" for certain problems but does not specify exact percentages or sample counts for training, validation, or test sets.
Hardware Specification	Yes	Runtimes reported in Figure 3 were measured on an Apple M1 Pro Chip using an off-the-shelf build of Tensor Flow [1].
Software Dependencies	No	The paper mentions software like GPFlow, Trieste, and TensorFlow, but does not specify their version numbers in the text. For example, in C.1: "We employed Gaussian process priors f GP(µ, k) ... using an off-the-shelf build of Tensor Flow [1]."
Experiment Setup	Yes	Each BO run was tasked with finding an ϵ-optimal point with probability at least 1 δ = 95%. On the Rosenbrock-4 fine-tuning problem, we used a regret bound ϵ = 10 4. For CNNs, we aimed to be within ϵ = 0.5% of the best test error (i.e., misclassification rate) seen across all runs, namely 0.62%. Likewise, when fitting XGBoost classifiers [12] for income prediction [7], we sought to be within 1% of the best found test error of 12.89%. For all other problems, we set ϵ = 0.1. ... Each model was trained using Adam [24], with batches of size 64.