A Quantile-based Approach for Hyperparameter Transfer Learning
Authors: David Salinas, Huibin Shen, Valerio Perrone
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate significant improvements over state-of-the-art methods for both hyperparameter optimization and neural architecture search. |
| Researcher Affiliation | Industry | 1NAVER LABS Europe (work started while being at Amazon) 2Amazon Web Services. Correspondence to: David Salinas <david.salinas@naverlabs.com>, Huibin Shen <huibishe@amazon.com>, Valerio Perrone <vperrone@amazon.com>. |
| Pseudocode | Yes | Pseudo-code is given in Algorithm 1. ... Pseudo-code is given in Algorithm 2. |
| Open Source Code | No | The paper does not provide an explicit statement or link for the open-sourcing of the methodology's code. |
| Open Datasets | Yes | We consider three algorithms in the HPO context: XGBoost (Chen & Guestrin, 2016), a 2-layer feed-forward neural network (FCNET) (Klein & Hutter, 2019), and the RNN-based time series prediction model proposed in Salinas et al. (2017) (Deep AR). ... We also run experiments on NAS-Bench-201 (Dong & Yang, 2020). ... The list of the datasets is in the appendix. |
| Dataset Splits | Yes | We compute tabular evaluations (log) uniformly beforehand on multiple datasets to compare methods with sufficiently many random repetitions. ... The transfer learning capabilities of each method are evaluated in a leave-one-task-out setting: one dataset is sequentially left out to assess how much transfer can be achieved from the other datasets, and overall results are aggregated. |
| Hardware Specification | Yes | We run each experiment with 30 random seeds on AWS batch with m4.xlarge instances. |
| Software Dependencies | No | The paper mentions using 'GPareto (Binois & Picheny, 2019)' but does not specify its version number or any other software dependencies with explicit version details. |
| Experiment Setup | Yes | The MLP hwh(x) used to regress µθ and σθ has 3 layers with 50 nodes, a dropout rate of 0.1 after each hidden layer and relu activation functions. The learning rate is set to 0.01, and ADAM is run over 1000 gradient updates three times, lowering the learning rate by 5 each time with a batch size of 64. |