reproducibilityindex.ai

On the Iteration Complexity of Hypergradient Computation

Authors: Riccardo Grazzi, Luca Franceschi, Massimiliano Pontil, Saverio Salzo

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present an extensive experimental comparison among the methods which conﬁrm the theoretical ﬁndings. and 3. Experiments
Researcher Affiliation	Academia	1Computational Statistics and Machine Learning, Istituto Italiano di Tecnologia, Genoa, Italy 2Department of Computer Science, University College London, London, UK.
Pseudocode	Yes	Algorithm 1 Iterative Differentiation (ITD) and Algorithm 2 Approximate Implicit Differentiation (AID)
Open Source Code	Yes	The algorithms have been implemented3 in Py Torch (Paszke et al., 2019). In the following, we shorthand AID-FP and AID-CG with FP and CG, respectively. 3The code is freely available at the following link. https://github.com/prolearner/hypertorch
Open Datasets	Yes	UCI Parkinson dataset (Little et al., 2008), 20 Newsgroup6. This dataset contains 18000 news divided in 20 topics and the features consist in 101631 tf-idf sparse vectors. 6http://qwone.com/~jason/20Newsgroups/, and Fashion MNIST dataset (Xiao et al., 2017).
Dataset Splits	Yes	We split the data randomly into three equal parts to make the train, validation and test sets.
Hardware Specification	No	in the case of 20 newsgroup for some t between 50 and 100, this cost exceeded the 11GB on the GPU. This mentions a GPU and its memory capacity but no specific model or other detailed hardware specifications.
Software Dependencies	No	The algorithms have been implemented3 in Py Torch (Paszke et al., 2019). This mentions the software 'Py Torch' but does not provide a specific version number required for reproduction.
Experiment Setup	Yes	We set h = 200 and use t = 20 ﬁxed-point iterations to solve the lower-level problem in all the experiments. and We solve each problem using (hyper)gradient descent with ﬁxed step size selected via grid search (additional details are provided in Appendix C.2). and We used t = k = 20 for all methods and Nesterov momentum for optimizing λ.