Bayesian Optimization over Discrete and Mixed Spaces via Probabilistic Reparameterization

Authors: Samuel Daulton, Xingchen Wan, David Eriksson, Maximilian Balandat, Michael A Osborne, Eytan Bakshy

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we provide an empirical evaluation of PR on a suite of synthetic problems and real world applications. For PR, we use stochastic mini-batches of N = 128 MC samples in our experiments and demonstrate that PR is robust with respect to the number of MC samples (and compare against analytic PR, where computationally feasible) in Appendix F. We optimize PR using Adam [34] with an initial learning rate of 1 40. We show that PR Adam is generally robust to the choice of learning rate (more so than vanilla stochastic gradient ascent) in the sensitivity analysis in Figure 21 in Appendix M. We compare PR against two alternative acquisition optimization strategies: using a continuous relaxation (CONT. RELAX.) and using exact discretization with approximate gradients (EXACT ROUND) [26].
Researcher Affiliation Collaboration Samuel Daulton University of Oxford, Meta sdaulton@meta.com Xingchen Wan University of Oxford xwan@robots.ox.ac.uk David Eriksson Meta deriksson@meta.com Maximilian Balandat Meta balandat@meta.com Michael A. Osborne University of Oxford mosb@robots.ox.ac.uk Eytan Bakshy Meta ebakshy@meta.com
Pseudocode Yes Algorithm 1 BO with PR
Open Source Code Yes We leverage existing open source implementations of CASMOPOLITAN and HYBO (see Appendix C for links), and the implementations of all of other methods are available at https://github.com/facebookresearch/bo_pr.
Open Datasets Yes SVM Feature Selection This problem involves jointly performing feature selection and hyperparameter optimization for a Support Vector Machine (SVM) trained on the CTSlice UCI data set [18, 36]. ... We fit a surrogate model to the direct arylation dataset from Shields et al. [51] in order to facilitate continuous optimization of temperature and concentration.
Dataset Splits No The paper does not provide explicit details about training, validation, and test dataset splits for the benchmark problems or real-world applications.
Hardware Specification Yes CONT. RELAX., EXACT ROUND, PR, and PR + TR are run on a single Tesla V100-SXM2-16GB GPU and other methods are run on an Intel Xeon Gold 6252N CPU.
Software Dependencies No The paper mentions using “Adam [34]” for optimization but does not provide specific version numbers for Adam or other key software components or libraries.
Experiment Setup Yes For PR, we use stochastic mini-batches of N = 128 MC samples in our experiments... We optimize PR using Adam [34] with an initial learning rate of 1 40.