reproducibilityindex.ai

Predictive Entropy Search for Bayesian Optimization with Unknown Constraints

Authors: Jose Miguel Hernandez-Lobato, Michael Gelbart, Matthew Hoffman, Ryan Adams, Zoubin Ghahramani

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We analyze the performance of PESC and show that it compares favorably to EI-based approaches on synthetic and benchmark problems, as well as several real-world examples.
Researcher Affiliation	Academia	Harvard University, Cambridge, MA 02138 USA, University of Cambridge, Cambridge, CB2 1PZ, UK
Pseudocode	No	The paper describes the algorithm mathematically but does not include structured pseudocode or algorithm blocks.
Open Source Code	Yes	Therefore, we have integrated our implementation, which carefully addresses these numerical issues, into the open-source Bayesian optimization package Spearmint at https://github.com/HIPS/ Spearmint/tree/PESC.
Open Datasets	Yes	The network is trained on the MNIST digit classiﬁcation task... sample from the posterior distribution of a logistic regression problem using the UCI German credit data set (Frank & Asuncion, 2010).
Dataset Splits	Yes	The objective is reported as the classiﬁcation error rate on the validation set.
Hardware Specification	Yes	prediction time must not exceed 2 ms on a Ge Force GTX 580 GPU (also used for training).
Software Dependencies	No	The paper mentions 'deepnet package', 'coda R package', and 'Py MC python package' but does not specify version numbers for these or other key software components.
Experiment Setup	Yes	PESC uses 10 samples from p(x \|Dn) when approximating the expectation in (7). We use the AL implementation provided by Gramacy et al. (2014) in the R package la GP... In all three methods, the GP hyperparameters are estimated by maximum likelihood. In this experiment and the next, the y-axis represents observed objective values, δ1 = 0.05, a Mat ern 5/2 GP covariance kernel is used, and GP hyperparameters are integrated out using slice sampling (Neal, 2000) as in Snoek et al. (2012). Curves are the mean over 5 independent experiments. The network is trained using the deepnet package... and the prediction time is computed as the average time of 1000 predictions, each for a batch of size 128. The network is trained on the MNIST digit classiﬁcation task with momentum-based stochastic gradient descent for 5000 iterations.