reproducibilityindex.ai

Amortized Implicit Differentiation for Stochastic Bilevel Optimization

Authors: Michael Arbel, Julien Mairal

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We run three sets of experiments described in Sections 5.1 to 5.3. In all cases, we consider Am IGO with either gradient descent (Am IGO-GD) or conjugate gradient (Am IGO-CG) for algorithm Bk. We Am IGO with AID methods without warm-start for Bk which we refer to as (AID-GD) and (AID-CG) and with (AID-CG-WS) which uses warm-start for Bk but not for Ak. We also consider other variants using either a ﬁxed-point algorithm (AID-FP) (Grazzi et al., 2020) or Neumann series expansion (AID-N) (Lorraine et al., 2020) for Bk. Finally, we consider two algorithms based on iterative differentiation which we refer to as (ITD) (Grazzi et al., 2020) and (Reverse) (Franceschi et al., 2017). For all methods except (AID-CG-WS), we use warm-start in algorithm Ak, however only Am IGO, Am IGO-CG and AID-CG-WS exploits warm-start in Bk the other AID based methods initializing Bk with z0=0. In Sections 5.2 and 5.3, we also compare with BSA algorithm (Ghadimi and Wang, 2018), TTSA algorithm (Hong et al., 2020a) and stoc Bi O (Ji et al., 2021). An implementation of Am IGO is available in https://github.com/Michael Arbel/Am IGO. and Figure 1: Top row: performance on the synthetic task... Bottom row: performance on the hyper-parameter optimization task.
Researcher Affiliation	Academia	Michael Arbel & Julien Mairal Univ. Grenoble Alpes, Inria, CNRS, Grenoble INP, LJK, 38000 Grenoble, France.
Pseudocode	Yes	Algorithm 1 Am IGO, Algorithm 2 Ak(x, y0), Algorithm 3 Bk(x, y, v, z0).
Open Source Code	Yes	An implementation of Am IGO is available in https://github.com/Michael Arbel/Am IGO.
Open Datasets	Yes	We consider a classiﬁcation task on the 20Newsgroup dataset and Figure 5 of Appendix F.3 shows the training loss (outer loss), the training and test accuracies of a model trained on MNIST by dataset distillation.
Dataset Splits	No	The outer-level cost functions for such task take the following form: f(x, y) = 1 \|Dval\| Pξ Dval L(y, ξ), g(x, y) = 1 \|Dtr\| Pξ Dtr L(y, ξ)... and optimized using an unregularized regression loss over the validation set while the model is learned using the training set. (Mentions use of training and validation sets, but does not provide specific percentages, counts, or explicit instructions for dataset splits needed for reproducibility).
Hardware Specification	No	No specific hardware details (like GPU/CPU models, memory, or cloud instance types) used for running the experiments are mentioned in the paper.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9) are mentioned in the paper.
Experiment Setup	Yes	For the default setting, we use the well-chosen parameters reported in Grazzi et al. (2020); Ji et al. (2021) where αk=γk=100, βk=0.5, and T=N=10. For the grid-search setting, we select the best performing parameters T, M and βk from a grid {10, 20} {5, 10} {0.5, 10}, while the batchsize (chosen to be the same for all steps of the algorithms) varies from 10 {0.1, 1, 2, 4}.