The Parallel Knowledge Gradient Method for Batch Bayesian Optimization
Authors: Jian Wu, Peter Frazier
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our experiments on both synthetic functions and tuning practical machine learning algorithms, q-KG consistently finds better function values than other parallel BO algorithms, such as parallel EI [2, 19, 25], batch UCB [5] and parallel UCB with exploration [3]. q-KG provides especially large value when function evaluations are noisy. |
| Researcher Affiliation | Academia | Jian Wu, Peter I. Frazier Cornell University Ithaca, NY, 14853 {jw926, pf98}@cornell.edu |
| Pseudocode | Yes | Algorithm 1 The q-KG algorithm |
| Open Source Code | Yes | The code in this paper is available at https://github.com/wujian16/q KG. |
| Open Datasets | Yes | First, we tune logistic regression on the MNIST dataset... In the second experiment, we tune a CNN on CIFAR10 dataset. |
| Dataset Splits | Yes | We train logistic regression on a training set with 60000 instances with a given set of hyperparameters and test it on a test set with 10000 instances. ... We train the CNN on the 50000 training data with certain hyperparameters and test it on the test set with 10000 instances. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions software like 'C++', 'python interface', 'GP regression and GP hyperparameter fitting methods', 'Metrics Optimization Engine', 'Spearmint', and 'Gpoptimization', but it does not specify version numbers for these software components. |
| Experiment Setup | Yes | We set the batch size to q = 4. ... We initiate our algorithms by randomly sampling 2d + 2 points from a Latin hypercube design, where d is the dimension of the problem. ... We use a constant mean prior and the ARD Mat ern 5/2 kernel. ... We set M = 1000 to discretize the domain following the strategy in Section 5.3. ... We tune 4 hyperparameters: mini batch size from 10 to 2000, training iterations from 100 to 10000, the ℓ2 regularization parameter from 0 to 1, and learning rate from 0 to 1. |