Bayesian Optimization with Tree-structured Dependencies

Authors: Rodolphe Jenatton, Cedric Archambeau, Javier González, Matthias Seeger

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on synthetic tree-structured objectives and on the tuning of feedforward neural networks show that our method compares favorably with competing approaches.
Researcher Affiliation Industry 1Amazon, Berlin, Germany. 2Amazon, Cambridge, United Kingdom. Correspondence to: Rodolphe Jenatton <jenatton@amazon.de>, Cedric Archambeau <cedrica@amazon.de>, Javier Gonzalez <gojav@amazon.co.uk>, Matthias Seeger <matthias@amazon.de>.
Pseudocode No The paper describes procedures and mathematical models but does not contain a dedicated pseudocode or algorithm block.
Open Source Code No The paper states 'Our implementation is in Python' but does not provide an explicit statement about open-sourcing the code or a link to a repository for their specific methodology.
Open Datasets Yes To provide a robust evaluation of the different competing methods, we consider a subset of the datasets from the Libsvm repository (Chang & Lin, 2011).
Dataset Splits No The paper states 'In absence of pre-defined default train-test split, we took a random 80% 20% split.', which only specifies train and test splits, without explicit mention of a separate validation set or cross-validation strategy.
Hardware Specification Yes Our implementation is in Python and we ran the experiments on a fleet of Amazon AWS c4.8xlarge machines.
Software Dependencies No The paper mentions software like Python and scikit-learn, and refers to GPy Opt and SMAC implementations, but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes We optimize for the number of hidden layers in {0, 1, 2, 3, 4}, the number of units per layer in {1, 2, . . . , 30} (provided the corresponding layer is activated), the choice of the activation function in {identity, logistic, tanh, relu}, which we constrain to be identical across all layers, the amount of ℓ2 regularization in [10 6, 10 1], the learning rate in [10 5, 10 1] of the underlying Adam solver (Kingma & Ba, 2014), the tolerance in [10 5, 10 2] of the solver (based on relative decrease), and the type of data pre-processing, which can be unit ℓ2-norm observation-wise normalization, ℓ -norm feature-wise normalization, mean/standarddeviation feature-wise whitening or no normalization at all. ... we add a CPU-time constraint of 5 minutes to each evaluation, beyond which the worst classification error 1.0 is returned.