Scalable Bayesian Optimization Using Deep Neural Networks

Authors: Jasper Snoek, Oren Rippel, Kevin Swersky, Ryan Kiros, Nadathur Satish, Narayanan Sundaram, Mostofa Patwary, Mr Prabhat, Ryan Adams

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of DNGO on a number of difficult problems, including benchmark problems for Bayesian optimization, convolutional neural networks for object recognition, and multi-modal neural language models for image caption generation. We find hyperparameter settings that achieve competitive with state-of-the-art results of 6.37% and 27.4% on CIFAR-10 and CIFAR-100 respectively, and BLEU scores of 25.1 and 26.7 on the Microsoft COCO 2014 dataset using a single model and a 3-model ensemble.
Researcher Affiliation Collaboration Jasper Snoek JSNOEK@SEAS.HARVARD.EDU Oren Rippel RIPPEL@MATH.MIT.EDU Kevin Swersky KSWERSKY@CS.TORONTO.EDU Ryan Kiros RKIROS@CS.TORONTO.EDU Nadathur Satish NADATHUR.RAJAGOPALAN.SATISH@INTEL.COM Narayanan Sundaram NARAYANAN.SUNDARAM@INTEL.COM Md. Mostofa Ali Patwary MOSTOFA.ALI.PATWARY@INTEL.COM Prabhat PRABHAT@LBL.GOV Ryan P. Adams RPA@SEAS.HARVARD.EDU Harvard University, School of Engineering and Applied Sciences Massachusetts Institute of Technology, Department of Mathematics University of Toronto, Department of Computer Science Intel Labs, Parallel Computing Lab NERSC, Lawrence Berkeley National Laboratory
Pseudocode No No explicit pseudocode or algorithm blocks were found in the paper.
Open Source Code Yes Available at https://github.com/orippel/micmat
Open Datasets Yes We optimize the hyperparameters of the log-bilinear model (LBL) from Kiros et al. (2014) to maximize the BLEU score of a validation set from the recently released COCO dataset (Lin et al., 2014). [...] We tune the hyperparameters of a deep convolutional neural network on the CIFAR-10 and CIFAR-100 datasets.
Dataset Splits Yes We optimize the hyperparameters of the log-bilinear model (LBL) from Kiros et al. (2014) to maximize the BLEU score of a validation set from the recently released COCO dataset (Lin et al., 2014). [...] We optimized these over a validation set of 10,000 examples drawn from the training set, running each network for 200 epochs.
Hardware Specification Yes We performed the optimization on a cluster of Intel R Xeon Phi TM coprocessors, with 40 jobs running in parallel using a kernel library that has been highly optimized for efficient computation on the Intel R Xeon Phi TM coprocessor3. [...] The image caption generation computations in this paper were run on the Odyssey cluster supported by the FAS Division of Science, Research Computing Group at Harvard University.
Software Dependencies No The paper mentions a 'kernel library' but does not specify software dependencies with version numbers (e.g., Python version, specific deep learning frameworks and their versions).
Experiment Setup Yes We optimize learning parameters such as learning rate, momentum and batch size; regularization parameters like dropout and weight decay for word and image representations; and architectural parameters such as the context size, whether to use the additive or multiplicative version, the size of the word embeddings and the multi-modal representation size. [...] For this architecture, we tuned the momentum, learning rate, ℓ2 weight decay coefficients, dropout rates, standard deviations of the random i.i.d. Gaussian weight initializations, and corruption bounds for various data augmentations: global perturbations of hue, saturation and value, random scalings, input pixel dropout and random horizontal reflections.