Bayesian Optimization with Tree-structured Dependencies
Authors: Rodolphe Jenatton, Cedric Archambeau, Javier González, Matthias Seeger
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on synthetic tree-structured objectives and on the tuning of feedforward neural networks show that our method compares favorably with competing approaches. |
| Researcher Affiliation | Industry | 1Amazon, Berlin, Germany. 2Amazon, Cambridge, United Kingdom. Correspondence to: Rodolphe Jenatton <jenatton@amazon.de>, Cedric Archambeau <cedrica@amazon.de>, Javier Gonzalez <gojav@amazon.co.uk>, Matthias Seeger <matthias@amazon.de>. |
| Pseudocode | No | The paper describes procedures and mathematical models but does not contain a dedicated pseudocode or algorithm block. |
| Open Source Code | No | The paper states 'Our implementation is in Python' but does not provide an explicit statement about open-sourcing the code or a link to a repository for their specific methodology. |
| Open Datasets | Yes | To provide a robust evaluation of the different competing methods, we consider a subset of the datasets from the Libsvm repository (Chang & Lin, 2011). |
| Dataset Splits | No | The paper states 'In absence of pre-defined default train-test split, we took a random 80% 20% split.', which only specifies train and test splits, without explicit mention of a separate validation set or cross-validation strategy. |
| Hardware Specification | Yes | Our implementation is in Python and we ran the experiments on a fleet of Amazon AWS c4.8xlarge machines. |
| Software Dependencies | No | The paper mentions software like Python and scikit-learn, and refers to GPy Opt and SMAC implementations, but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | We optimize for the number of hidden layers in {0, 1, 2, 3, 4}, the number of units per layer in {1, 2, . . . , 30} (provided the corresponding layer is activated), the choice of the activation function in {identity, logistic, tanh, relu}, which we constrain to be identical across all layers, the amount of ℓ2 regularization in [10 6, 10 1], the learning rate in [10 5, 10 1] of the underlying Adam solver (Kingma & Ba, 2014), the tolerance in [10 5, 10 2] of the solver (based on relative decrease), and the type of data pre-processing, which can be unit ℓ2-norm observation-wise normalization, ℓ -norm feature-wise normalization, mean/standarddeviation feature-wise whitening or no normalization at all. ... we add a CPU-time constraint of 5 minutes to each evaluation, beyond which the worst classification error 1.0 is returned. |