Scalable Hyperparameter Transfer Learning

Authors: Valerio Perrone, Rodolphe Jenatton, Matthias W. Seeger, Cedric Archambeau

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that the neural net learns a representation suitable for warm-starting the black-box optimization problems and that BO runs can be accelerated when the target black-box function (e.g., validation loss) is learned together with other related signals (e.g., training loss). The proposed method was found to be at least one order of magnitude faster than competing methods recently published in the literature. Section 4 presents experiments on simulated and real data, reporting favorable comparisons with existing alternatives when leveraging data across auxiliary tasks and signals.
Researcher Affiliation Industry Valerio Perrone, Rodolphe Jenatton, Matthias Seeger, Cédric Archambeau Amazon Berlin, Germany {vperrone, jenatton, matthis, cedrica}@amazon.com
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. It describes the BO loop in natural language.
Open Source Code No The paper mentions using publicly available code for *other* methods (DNGO, BOHAMIANN) at a given URL, but does not provide a link or explicit statement about releasing the source code for *their own* ABLR methodology.
Open Datasets Yes In Sections 4.2 and 4.3, we evaluate its potential to transfer information between tasks defined by, respectively, synthetic data and Open ML data [32]. In Section 4.4, we investigate the transfer learning ability of ABLR in presence of multiple heterogeneous signals. Open ML data [32], LIBSVM [45]. [32] Joaquin Vanschoren, Jan N Van Rijn, Bernd Bischl, and Luis Torgo. Open ML: Networked science in machine learning. ACM SIGKDD Explorations Newsletter, 15(2):49 60, 2014. [45] Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1 27:27, 2011.
Dataset Splits No The paper mentions using a 'leave-one-task-out' protocol and tuning parameters based on 'validation error', but it does not specify explicit train/validation/test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification Yes All our measurements are made on a c4.2xlarge AWS machine.
Software Dependencies No The paper mentions using GPy Opt [33] and MXNet [12], and Adam [40] for optimization, but does not provide specific version numbers for these software components.
Experiment Setup Yes The NN that learns the feature map φz(x) is similar to the one used in [18]. It has three fully connected layers, each with 50 units and tanh activation function. The dimension D = 100 was picked after we investigated the computation time of ABLR-based HPO with learned NN features (D = 50) and with RKS features (D {50, 100, 200}). All BO experiments use the expected improvement acquisition function [2]. The feedforward NN was trained for 200 iterations, each time on a batch of 200 samples.