Predictive Entropy Search for Bayesian Optimization with Unknown Constraints
Authors: Jose Miguel Hernandez-Lobato, Michael Gelbart, Matthew Hoffman, Ryan Adams, Zoubin Ghahramani
ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We analyze the performance of PESC and show that it compares favorably to EI-based approaches on synthetic and benchmark problems, as well as several real-world examples. |
| Researcher Affiliation | Academia | Harvard University, Cambridge, MA 02138 USA, University of Cambridge, Cambridge, CB2 1PZ, UK |
| Pseudocode | No | The paper describes the algorithm mathematically but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Therefore, we have integrated our implementation, which carefully addresses these numerical issues, into the open-source Bayesian optimization package Spearmint at https://github.com/HIPS/ Spearmint/tree/PESC. |
| Open Datasets | Yes | The network is trained on the MNIST digit classification task... sample from the posterior distribution of a logistic regression problem using the UCI German credit data set (Frank & Asuncion, 2010). |
| Dataset Splits | Yes | The objective is reported as the classification error rate on the validation set. |
| Hardware Specification | Yes | prediction time must not exceed 2 ms on a Ge Force GTX 580 GPU (also used for training). |
| Software Dependencies | No | The paper mentions 'deepnet package', 'coda R package', and 'Py MC python package' but does not specify version numbers for these or other key software components. |
| Experiment Setup | Yes | PESC uses 10 samples from p(x |Dn) when approximating the expectation in (7). We use the AL implementation provided by Gramacy et al. (2014) in the R package la GP... In all three methods, the GP hyperparameters are estimated by maximum likelihood. In this experiment and the next, the y-axis represents observed objective values, δ1 = 0.05, a Mat ern 5/2 GP covariance kernel is used, and GP hyperparameters are integrated out using slice sampling (Neal, 2000) as in Snoek et al. (2012). Curves are the mean over 5 independent experiments. The network is trained using the deepnet package... and the prediction time is computed as the average time of 1000 predictions, each for a batch of size 128. The network is trained on the MNIST digit classification task with momentum-based stochastic gradient descent for 5000 iterations. |