Sample-efficient Bayesian Optimisation Using Known Invariances

Authors: Theodore Brown, Alexandru Cioba, Ilija Bogunovic

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply these algorithms to synthetic and nuclear fusion optimisation tasks, and show that incorporating the underlying invariance into the kernel significantly improves the sample efficiency. We also investigate the behaviour of the algorithms when the full invariant kernel is not known, and provide an empirical study of the performance of invariance-aware BO on target functions that are quasi-invariant (modelled as a sum of invariant and non-invariant components). Our main contribution is the derivation of novel upper and lower bounds on the regret achieved by invariance-aware BO algorithms.
Researcher Affiliation Collaboration Theodore Brown 1,2 Alexandru Cioba 3 Ilija Bogunovic1 Equal contribution 1University College London 2United Kingdom Atomic Energy Authority 3Media Tek Research {theodore.brown.23, i.bogunovic}@ucl.ac.uk alexandru.cioba@mtkresearch.com
Pseudocode Yes Algorithm 1 Max. variance reduction [42] Input: k G, X 1: initialize D = {} 2: for t = 1, . . . , T do 3: Select xt = arg maxx X σt 1(x) 4: Observe yt = f(xt) + ηt 5: Append (xt, yt) to D 6: Update µt, σt from Eqs. 9, 10 7: end for 8: return ˆx MVR opt = arg maxx X µT (x) Algorithm 2 Upper confidence bound [37] Input: k G, X 1: initialize D = {} 2: for t = 1, . . . , T do 3: Select xt = arg maxx X µt 1(x)+βσt 1(x) 4: Observe yt = f(xt) + ηt 5: Append (xt, yt) to D 6: Update µt, σt from Eqs. 9, 10 7: end for 8: return ˆx UCB opt = x T
Open Source Code Yes Code for our experiments and implementations of invariant kernels can be found on Git Hub1. 1github.com/theo-brown/invariantkernels and github.com/theo-brown/bayesopt_with_invariances
Open Datasets No The paper describes generating synthetic objective functions by sampling from a Gaussian Process prior and fitting noise parameters (Appendix B.1). For the real-world application, it states "The kernel hyperparameters are learned offline, based on 256 runs of the fusion simulator with parameters generated by low-discrepancy Sobol sampling" (Appendix B.2). No concrete access information (link, DOI, repository, or formal citation to a publicly available dataset) is provided for either the synthetic data generation process itself or the fusion simulator data.
Dataset Splits No The paper describes the generation of objective functions for synthetic experiments and data from a fusion simulator, but it does not specify explicit train, validation, or test dataset splits in the traditional sense for a fixed dataset. The evaluation is primarily based on regret over optimization steps, which is a continuous process of querying points, rather than predefined data partitions.
Hardware Specification Yes Figure 6: Effect of group size on cost of data augmentation and invariant kernel methods. Benchmark task is to fit a 6D GP with the given kernel to 100 random datapoints from Perm Inv-6D. Shown are results from 100 seeds, 64 repeats per seed, performed on one NVIDIA V100-32GB GPU using Bo Torch.
Software Dependencies No The paper mentions the use of "Bo Torch" in its experimental details (Appendix B.1, B.2, and Figure 6 caption), citing the 2020 NeurIPS paper [2]. However, it does not specify a version number for Bo Torch itself (e.g., Bo Torch 1.x) nor does it list specific version numbers for other potential key software dependencies like Python, PyTorch, or CUDA.
Experiment Setup Yes We use an isotropic Matérn-5/2 kernel as the base kernel k... kernel lengthscale l = 0.12. We scale n with the dimensionality of the problem d, so that n = 64 for d = 2, n = 256 for d = 3, and n = 512 for d = 6... For UCB, we use β = 2.0, except in the quasi-invariant case with ε = 0.05 where we found that increasing β was necessary to achieve good performance (β = 3.0). MVR has no algorithm hyperparameters. The kernel hyperparameters are learned offline, based on 256 runs of the fusion simulator with parameters generated by low-discrepancy Sobol sampling. Due to the cost of evaluating the objective function we use a batched version of the acquisition function as implemented by Bo Torch, querying points in batches of 10. We use the multi-start optimisation routine from Bo Torch for finding the maximum acquisition function value.