Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Bayesian Optimization with Gradients
Authors: Jian Wu, Matthias Poloczek, Andrew G. Wilson, Peter Frazier
NeurIPS 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In numerical experiments we compare with state-of-the-art batch Bayesian optimization algorithms with and without derivative information, and the gradient-based optimizer BFGS with full gradients. |
| Researcher Affiliation | Academia | 1 Cornell University, 2 University of Arizona |
| Pseudocode | Yes | Algorithm 1 d-KG with Relevant Directional Derivative Detection |
| Open Source Code | Yes | The code for this paper is available at https://github.com/wujian16/Cornell-MOE. |
| Open Datasets | Yes | We use the yellow cab NYC public data set from June 2016, sampling 10000 records from June 1 25 as training data and 1000 trip records from June 26 30 as validation data. ... We tune logistic regression and a feedforward neural network with 2 hidden layers on the MNIST dataset [20], a standard classification task for handwritten digits. |
| Dataset Splits | Yes | We use the yellow cab NYC public data set from June 2016, sampling 10000 records from June 1 25 as training data and 1000 trip records from June 26 30 as validation data. ... The training set contains 60000 images, the test set 10000. |
| Hardware Specification | No | The paper discusses computational complexity and scaling (e.g., GP inference scales as O(n3(d + 1)3)), but it does not provide specific hardware details such as GPU/CPU models or memory used for experiments. |
| Software Dependencies | No | The paper mentions using the 'emcee package' and 'scipy' but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We choose m in [30, 200], l2 1 in [101, 108], and l2 2, l2 3, l2 4, l2 5 each in [10 8, 10 1]. ... We tune 4 hyperparameters for logistic regression: the ℓ2 regularization parameter from 0 to 1, learning rate from 0 to 1, mini batch size from 20 to 2000 and training epochs from 5 to 50. ... We also experiment with two different batch sizes: we use a batch size q = 4 for the Branin, Rosenbrock, and Ackley functions; otherwise, we use a batch size q = 8. |