Bilevel Programming for Hyperparameter Optimization and Meta-Learning
Authors: Luca Franceschi, Paolo Frasconi, Saverio Salzo, Riccardo Grazzi, Massimiliano Pontil
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The aim of the following experiments is threefold. First, we investigate the impact of the number of iterations of the optimization dynamics on the quality of the solution on a simple multiclass classification problem. Second, we test our hyper-representation method in the context of few-shot learning on two benchmark datasets. Finally, we constrast the bilevel ML approach against classical approaches to learn shared representations. |
| Researcher Affiliation | Academia | 1Computational Statistics and Machine Learning, Istituto Italiano di Tecnologia, Genoa, Italy 2Department of Computer Science, University College London, London, UK 3Department of Information Engineering, Universit a degli Studi di Firenze, Florence, Italy. |
| Pseudocode | Yes | Algorithm 1. Reverse-HG for Hyper-representation |
| Open Source Code | Yes | The code for reproducing the experiments, based on the package FAR-HO (https://bit.ly/far-ho), is available at https://bit.ly/hyper-repr |
| Open Datasets | Yes | OMNIGLOT (Lake et al., 2015), a dataset that contains examples of 1623 different handwritten characters from 50 alphabets. ... MINIIMAGENET (Vinyals et al., 2016), a subset of Image Net (Deng et al., 2009), that contains 60000 downsampled images from 100 different classes. |
| Dataset Splits | Yes | A training set Dtr and a validation set Dval, each consisting of three randomly drawn examples per class, were sampled to form the HO problem. ... each meta-dataset consists of a pool of samples belonging to different (non-overlapping between separate meta-dataset) classes, which can be combined to form ground classification datasets Dj = Dj tr Dj val with 5 or 20 classes (for Omniglot). |
| Hardware Specification | Yes | Table 2. Execution times on a NVidia Tesla M40 GPU. |
| Software Dependencies | No | The paper mentions using a package 'FAR-HO' but does not specify version numbers for this or any other software dependencies (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | The optimization of H is performed with gradient descent with momentum, with same initialization, step size and momentum factor for each run. ... We initialize ground models parameters wj to 0 and... we perform T gradient descent steps, where T is treated as a ML hyperparameter that has to be validated. ... We compute a stochastic approximation of f T (λ) with Algorithm 1 and use Adam with decaying learning rate to optimize λ. |