Forward and Reverse Gradient-Based Hyperparameter Optimization

Authors: Luca Franceschi, Michele Donini, Paolo Frasconi, Massimiliano Pontil

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present numerical simulations with the proposed methods. All algorithms were implemented in Tensor Flow and the software package used to reproduce our experiments is available at https://github. com/lucfra/RFHO. In all the experiments, hypergradients were used inside the Adam algorithm (Kingma & Ba, 2014) in order to minimize the response function.
Researcher Affiliation Academia Computational Statistics and Machine Learning, Istituto Italiano di Tecnologia, Genoa, Italy; Department of Computer Science, University College London, UK; Department of Information Engineering, Università degli Studi di Firenze, Italy.
Pseudocode Yes Algorithm 1 REVERSE-HG; Algorithm 2 FORWARD-HG
Open Source Code Yes All algorithms were implemented in Tensor Flow and the software package used to reproduce our experiments is available at https://github. com/lucfra/RFHO.
Open Datasets Yes We instantiated the above setting with a balanced subset of N = 20000 examples from the MNIST dataset... (Section 5.1); We used CIFAR-10 and CIFAR-100 (Krizhevsky & Hinton, 2009)... (Section 5.2); Data for all experiments was obtained from the TIMIT phonetic recognition dataset (Garofolo et al., 1993). (Section 5.3)
Dataset Splits Yes We instantiated the above setting with a balanced subset of N = 20000 examples from the MNIST dataset, split into three subsets: Dtr of Ntr = 5000 training examples, V of Nval = 5000 validation examples and a test set containing the remaining samples. (Section 5.1); Training, validation and test sets contain respectively 73%, 23% and 4% of the data. (Section 5.3)
Hardware Specification Yes Results are not reported since the method could not make any appreciable progress after running 24 hours on a Titan X GPU.
Software Dependencies No All algorithms were implemented in Tensor Flow. However, no specific version number for TensorFlow or any other software dependency is provided.
Experiment Setup Yes In all the experiments, hypergradients were used inside the Adam algorithm (Kingma & Ba, 2014)...; In all the experiments we fix a minibatch size of 500.; In Experiments 3 and 4 we used a hyperbatch size of = 200 (see Eq. (16)) and a hyper-learning rate of 0.005.; Vanilla: ... η and µ are set to 0.075 and 0.5 respectively... (All from Section 5 and its subsections).