Input Warping for Bayesian Optimization of Non-Stationary Functions

Authors: Jasper Snoek, Kevin Swersky, Rich Zemel, Ryan Adams

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In the empirical study that forms the experimental part of this paper, we show that modeling non-stationarity is extremely important and yields significant empirical improvements in the performance of Bayesian optimization.
Researcher Affiliation Academia Jasper Snoek JSNOEK@SEAS.HARVARD.EDU School of Engineering and Applied Sciences, Harvard University Kevin Swersky KSWERSKY@CS.TORONTO.EDU Department of Computer Science, University of Toronto Richard Zemel ZEMEL@CS.TORONTO.EDU Department of Computer Science, University of Toronto and Canadian Institute for Advanced Research Ryan P. Adams RPA@SEAS.HARVARD.EDU School of Engineering and Applied Sciences, Harvard University
Pseudocode No No structured pseudocode or algorithm blocks were found.
Open Source Code No The paper mentions using a third-party package 'Deepnet' but does not provide concrete access to the source code for the methodology described in this paper.
Open Datasets Yes on a subset of the popular CIFAR-10 data set (Krizhevsky, 2009)
Dataset Splits No The paper mentions datasets and training examples but does not provide specific details on training, validation, and test splits (e.g., percentages, exact counts, or specific split files).
Hardware Specification No No specific hardware details (such as exact GPU/CPU models or memory) used for running the experiments are provided.
Software Dependencies No The paper mentions software packages like 'Deepnet' and 'Spearmint' but does not provide specific version numbers for these or other ancillary software components.
Experiment Setup Yes The deep network consists of three convolutional layers and two fully connected layers and we optimize over two learning rates, one for each layer type, six dropout regularization rates, six weight norm constraints, the number of hidden units per layer, a convolutional kernel size and a pooling size for a total of 21 hyperparameters.