Predictive Entropy Search for Bayesian Optimization with Unknown Constraints

Authors: Jose Miguel Hernandez-Lobato, Michael Gelbart, Matthew Hoffman, Ryan Adams, Zoubin Ghahramani

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We analyze the performance of PESC and show that it compares favorably to EI-based approaches on synthetic and benchmark problems, as well as several real-world examples.
Researcher Affiliation Academia Harvard University, Cambridge, MA 02138 USA, University of Cambridge, Cambridge, CB2 1PZ, UK
Pseudocode No The paper describes the algorithm mathematically but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes Therefore, we have integrated our implementation, which carefully addresses these numerical issues, into the open-source Bayesian optimization package Spearmint at https://github.com/HIPS/ Spearmint/tree/PESC.
Open Datasets Yes The network is trained on the MNIST digit classification task... sample from the posterior distribution of a logistic regression problem using the UCI German credit data set (Frank & Asuncion, 2010).
Dataset Splits Yes The objective is reported as the classification error rate on the validation set.
Hardware Specification Yes prediction time must not exceed 2 ms on a Ge Force GTX 580 GPU (also used for training).
Software Dependencies No The paper mentions 'deepnet package', 'coda R package', and 'Py MC python package' but does not specify version numbers for these or other key software components.
Experiment Setup Yes PESC uses 10 samples from p(x |Dn) when approximating the expectation in (7). We use the AL implementation provided by Gramacy et al. (2014) in the R package la GP... In all three methods, the GP hyperparameters are estimated by maximum likelihood. In this experiment and the next, the y-axis represents observed objective values, δ1 = 0.05, a Mat ern 5/2 GP covariance kernel is used, and GP hyperparameters are integrated out using slice sampling (Neal, 2000) as in Snoek et al. (2012). Curves are the mean over 5 independent experiments. The network is trained using the deepnet package... and the prediction time is computed as the average time of 1000 predictions, each for a batch of size 128. The network is trained on the MNIST digit classification task with momentum-based stochastic gradient descent for 5000 iterations.