The Parallel Knowledge Gradient Method for Batch Bayesian Optimization

Authors: Jian Wu, Peter Frazier

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments on both synthetic functions and tuning practical machine learning algorithms, q-KG consistently finds better function values than other parallel BO algorithms, such as parallel EI [2, 19, 25], batch UCB [5] and parallel UCB with exploration [3]. q-KG provides especially large value when function evaluations are noisy.
Researcher Affiliation Academia Jian Wu, Peter I. Frazier Cornell University Ithaca, NY, 14853 {jw926, pf98}@cornell.edu
Pseudocode Yes Algorithm 1 The q-KG algorithm
Open Source Code Yes The code in this paper is available at https://github.com/wujian16/q KG.
Open Datasets Yes First, we tune logistic regression on the MNIST dataset... In the second experiment, we tune a CNN on CIFAR10 dataset.
Dataset Splits Yes We train logistic regression on a training set with 60000 instances with a given set of hyperparameters and test it on a test set with 10000 instances. ... We train the CNN on the 50000 training data with certain hyperparameters and test it on the test set with 10000 instances.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions software like 'C++', 'python interface', 'GP regression and GP hyperparameter fitting methods', 'Metrics Optimization Engine', 'Spearmint', and 'Gpoptimization', but it does not specify version numbers for these software components.
Experiment Setup Yes We set the batch size to q = 4. ... We initiate our algorithms by randomly sampling 2d + 2 points from a Latin hypercube design, where d is the dimension of the problem. ... We use a constant mean prior and the ARD Mat ern 5/2 kernel. ... We set M = 1000 to discretize the domain following the strategy in Section 5.3. ... We tune 4 hyperparameters: mini batch size from 10 to 2000, training iterations from 100 to 10000, the ℓ2 regularization parameter from 0 to 1, and learning rate from 0 to 1.