reproducibilityindex.ai

Bilevel Programming for Hyperparameter Optimization and Meta-Learning

Authors: Luca Franceschi, Paolo Frasconi, Saverio Salzo, Riccardo Grazzi, Massimiliano Pontil

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The aim of the following experiments is threefold. First, we investigate the impact of the number of iterations of the optimization dynamics on the quality of the solution on a simple multiclass classiﬁcation problem. Second, we test our hyper-representation method in the context of few-shot learning on two benchmark datasets. Finally, we constrast the bilevel ML approach against classical approaches to learn shared representations.
Researcher Affiliation	Academia	1Computational Statistics and Machine Learning, Istituto Italiano di Tecnologia, Genoa, Italy 2Department of Computer Science, University College London, London, UK 3Department of Information Engineering, Universit a degli Studi di Firenze, Florence, Italy.
Pseudocode	Yes	Algorithm 1. Reverse-HG for Hyper-representation
Open Source Code	Yes	The code for reproducing the experiments, based on the package FAR-HO (https://bit.ly/far-ho), is available at https://bit.ly/hyper-repr
Open Datasets	Yes	OMNIGLOT (Lake et al., 2015), a dataset that contains examples of 1623 different handwritten characters from 50 alphabets. ... MINIIMAGENET (Vinyals et al., 2016), a subset of Image Net (Deng et al., 2009), that contains 60000 downsampled images from 100 different classes.
Dataset Splits	Yes	A training set Dtr and a validation set Dval, each consisting of three randomly drawn examples per class, were sampled to form the HO problem. ... each meta-dataset consists of a pool of samples belonging to different (non-overlapping between separate meta-dataset) classes, which can be combined to form ground classiﬁcation datasets Dj = Dj tr Dj val with 5 or 20 classes (for Omniglot).
Hardware Specification	Yes	Table 2. Execution times on a NVidia Tesla M40 GPU.
Software Dependencies	No	The paper mentions using a package 'FAR-HO' but does not specify version numbers for this or any other software dependencies (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	The optimization of H is performed with gradient descent with momentum, with same initialization, step size and momentum factor for each run. ... We initialize ground models parameters wj to 0 and... we perform T gradient descent steps, where T is treated as a ML hyperparameter that has to be validated. ... We compute a stochastic approximation of f T (λ) with Algorithm 1 and use Adam with decaying learning rate to optimize λ.