Bayesian Optimization with Gradients

Authors: Jian Wu, Matthias Poloczek, Andrew G. Wilson, Peter Frazier

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In numerical experiments we compare with state-of-the-art batch Bayesian optimization algorithms with and without derivative information, and the gradient-based optimizer BFGS with full gradients.
Researcher Affiliation Academia 1 Cornell University, 2 University of Arizona
Pseudocode Yes Algorithm 1 d-KG with Relevant Directional Derivative Detection
Open Source Code Yes The code for this paper is available at https://github.com/wujian16/Cornell-MOE.
Open Datasets Yes We use the yellow cab NYC public data set from June 2016, sampling 10000 records from June 1 25 as training data and 1000 trip records from June 26 30 as validation data. ... We tune logistic regression and a feedforward neural network with 2 hidden layers on the MNIST dataset [20], a standard classification task for handwritten digits.
Dataset Splits Yes We use the yellow cab NYC public data set from June 2016, sampling 10000 records from June 1 25 as training data and 1000 trip records from June 26 30 as validation data. ... The training set contains 60000 images, the test set 10000.
Hardware Specification No The paper discusses computational complexity and scaling (e.g., GP inference scales as O(n3(d + 1)3)), but it does not provide specific hardware details such as GPU/CPU models or memory used for experiments.
Software Dependencies No The paper mentions using the 'emcee package' and 'scipy' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes We choose m in [30, 200], l2 1 in [101, 108], and l2 2, l2 3, l2 4, l2 5 each in [10 8, 10 1]. ... We tune 4 hyperparameters for logistic regression: the ℓ2 regularization parameter from 0 to 1, learning rate from 0 to 1, mini batch size from 20 to 2000 and training epochs from 5 to 50. ... We also experiment with two different batch sizes: we use a batch size q = 4 for the Branin, Rosenbrock, and Ackley functions; otherwise, we use a batch size q = 8.