reproducibilityindex.ai

Forward and Reverse Gradient-Based Hyperparameter Optimization

Authors: Luca Franceschi, Michele Donini, Paolo Frasconi, Massimiliano Pontil

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we present numerical simulations with the proposed methods. All algorithms were implemented in Tensor Flow and the software package used to reproduce our experiments is available at https://github. com/lucfra/RFHO. In all the experiments, hypergradients were used inside the Adam algorithm (Kingma & Ba, 2014) in order to minimize the response function.
Researcher Affiliation	Academia	Computational Statistics and Machine Learning, Istituto Italiano di Tecnologia, Genoa, Italy; Department of Computer Science, University College London, UK; Department of Information Engineering, Università degli Studi di Firenze, Italy.
Pseudocode	Yes	Algorithm 1 REVERSE-HG; Algorithm 2 FORWARD-HG
Open Source Code	Yes	All algorithms were implemented in Tensor Flow and the software package used to reproduce our experiments is available at https://github. com/lucfra/RFHO.
Open Datasets	Yes	We instantiated the above setting with a balanced subset of N = 20000 examples from the MNIST dataset... (Section 5.1); We used CIFAR-10 and CIFAR-100 (Krizhevsky & Hinton, 2009)... (Section 5.2); Data for all experiments was obtained from the TIMIT phonetic recognition dataset (Garofolo et al., 1993). (Section 5.3)
Dataset Splits	Yes	We instantiated the above setting with a balanced subset of N = 20000 examples from the MNIST dataset, split into three subsets: Dtr of Ntr = 5000 training examples, V of Nval = 5000 validation examples and a test set containing the remaining samples. (Section 5.1); Training, validation and test sets contain respectively 73%, 23% and 4% of the data. (Section 5.3)
Hardware Specification	Yes	Results are not reported since the method could not make any appreciable progress after running 24 hours on a Titan X GPU.
Software Dependencies	No	All algorithms were implemented in Tensor Flow. However, no specific version number for TensorFlow or any other software dependency is provided.
Experiment Setup	Yes	In all the experiments, hypergradients were used inside the Adam algorithm (Kingma & Ba, 2014)...; In all the experiments we ﬁx a minibatch size of 500.; In Experiments 3 and 4 we used a hyperbatch size of = 200 (see Eq. (16)) and a hyper-learning rate of 0.005.; Vanilla: ... η and µ are set to 0.075 and 0.5 respectively... (All from Section 5 and its subsections).