Forward and Reverse Gradient-Based Hyperparameter Optimization
Authors: Luca Franceschi, Michele Donini, Paolo Frasconi, Massimiliano Pontil
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present numerical simulations with the proposed methods. All algorithms were implemented in Tensor Flow and the software package used to reproduce our experiments is available at https://github. com/lucfra/RFHO. In all the experiments, hypergradients were used inside the Adam algorithm (Kingma & Ba, 2014) in order to minimize the response function. |
| Researcher Affiliation | Academia | Computational Statistics and Machine Learning, Istituto Italiano di Tecnologia, Genoa, Italy; Department of Computer Science, University College London, UK; Department of Information Engineering, Università degli Studi di Firenze, Italy. |
| Pseudocode | Yes | Algorithm 1 REVERSE-HG; Algorithm 2 FORWARD-HG |
| Open Source Code | Yes | All algorithms were implemented in Tensor Flow and the software package used to reproduce our experiments is available at https://github. com/lucfra/RFHO. |
| Open Datasets | Yes | We instantiated the above setting with a balanced subset of N = 20000 examples from the MNIST dataset... (Section 5.1); We used CIFAR-10 and CIFAR-100 (Krizhevsky & Hinton, 2009)... (Section 5.2); Data for all experiments was obtained from the TIMIT phonetic recognition dataset (Garofolo et al., 1993). (Section 5.3) |
| Dataset Splits | Yes | We instantiated the above setting with a balanced subset of N = 20000 examples from the MNIST dataset, split into three subsets: Dtr of Ntr = 5000 training examples, V of Nval = 5000 validation examples and a test set containing the remaining samples. (Section 5.1); Training, validation and test sets contain respectively 73%, 23% and 4% of the data. (Section 5.3) |
| Hardware Specification | Yes | Results are not reported since the method could not make any appreciable progress after running 24 hours on a Titan X GPU. |
| Software Dependencies | No | All algorithms were implemented in Tensor Flow. However, no specific version number for TensorFlow or any other software dependency is provided. |
| Experiment Setup | Yes | In all the experiments, hypergradients were used inside the Adam algorithm (Kingma & Ba, 2014)...; In all the experiments we fix a minibatch size of 500.; In Experiments 3 and 4 we used a hyperbatch size of = 200 (see Eq. (16)) and a hyper-learning rate of 0.005.; Vanilla: ... η and µ are set to 0.075 and 0.5 respectively... (All from Section 5 and its subsections). |