BOSS: Bayesian Optimization over String Spaces

Authors: Henry Moss, David Leslie, Daniel Beck, Javier González, Paul Rayson

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We now evaluate our proposed BO framework on tasks from a range of fields and syntactical constraints. Our code is available at github.com/henrymoss/BOSS and is built upon the Emukit Python package [Paleyes et al., 2019]. All results are based on runs across 15 random seeds, showing the mean and a single standard error of the best objective value found as we increase the optimization budget.
Researcher Affiliation Collaboration Henry B. Moss STOR-i Centre for Doctoral Training Lancaster University, UK h.moss@lancaster.ac.uk Daniel Beck Computing and Information Systems University of Melbourne, Australia d.beck@unimelb.edu.au Javier González Microsoft Research Cambridge, UK David S. Leslie Dept. of Mathematics and Statistics Lancaster University, UK Paul Rayson School of Computing and Communications Lancaster University, UK
Pseudocode No The paper describes algorithms in text and through figures but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at github.com/henrymoss/BOSS and is built upon the Emukit Python package [Paleyes et al., 2019].
Open Datasets Yes We replicate the symbolic regression example of Kusner et al. [2017], using their provided VAEs pre-trained for this exact problem. ...large collection of 250, 000 candidate molecules used by Kusner et al. [2017]...
Dataset Splits No The paper discusses training and testing for different models but does not provide explicit details on train/validation/test dataset splits (percentages or counts) for its own experiments.
Hardware Specification Yes Although acquisition function calculations could be parallelized across the populations of our GA at each BO step, we use a single-core Intel Xeon 2.30GHz processor to paint a clear picture of computational cost.
Software Dependencies No The paper mentions building upon the 'Emukit Python package' but does not provide specific version numbers for Emukit or Python, which are necessary for full reproducibility of software dependencies.
Experiment Setup Yes All results are based on runs across 15 random seeds, showing the mean and a single standard error of the best objective value found as we increase the optimization budget. ... After a random initialization of min(5, |Σ|) evaluations, kernel parameters are re-estimated to maximize model likelihood before each BO step. ... Our genetic algorithms (ga) limited to 100 evolutions of a population of size 100.