Bayesian Optimization with Gradients
Authors: Jian Wu, Matthias Poloczek, Andrew G. Wilson, Peter Frazier
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In numerical experiments we compare with state-of-the-art batch Bayesian optimization algorithms with and without derivative information, and the gradient-based optimizer BFGS with full gradients. |
| Researcher Affiliation | Academia | 1 Cornell University, 2 University of Arizona |
| Pseudocode | Yes | Algorithm 1 d-KG with Relevant Directional Derivative Detection |
| Open Source Code | Yes | The code for this paper is available at https://github.com/wujian16/Cornell-MOE. |
| Open Datasets | Yes | We use the yellow cab NYC public data set from June 2016, sampling 10000 records from June 1 25 as training data and 1000 trip records from June 26 30 as validation data. ... We tune logistic regression and a feedforward neural network with 2 hidden layers on the MNIST dataset [20], a standard classification task for handwritten digits. |
| Dataset Splits | Yes | We use the yellow cab NYC public data set from June 2016, sampling 10000 records from June 1 25 as training data and 1000 trip records from June 26 30 as validation data. ... The training set contains 60000 images, the test set 10000. |
| Hardware Specification | No | The paper discusses computational complexity and scaling (e.g., GP inference scales as O(n3(d + 1)3)), but it does not provide specific hardware details such as GPU/CPU models or memory used for experiments. |
| Software Dependencies | No | The paper mentions using the 'emcee package' and 'scipy' but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We choose m in [30, 200], l2 1 in [101, 108], and l2 2, l2 3, l2 4, l2 5 each in [10 8, 10 1]. ... We tune 4 hyperparameters for logistic regression: the ℓ2 regularization parameter from 0 to 1, learning rate from 0 to 1, mini batch size from 20 to 2000 and training epochs from 5 to 50. ... We also experiment with two different batch sizes: we use a batch size q = 4 for the Branin, Rosenbrock, and Ackley functions; otherwise, we use a batch size q = 8. |