Debiasing a First-order Heuristic for Approximate Bi-level Optimization

Authors: Valerii Likhosherstov, Xingyou Song, Krzysztof Choromanski, Jared Q Davis, Adrian Weller

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the utility of UFOM in a synthetic experiment, data hypercleaning on MNIST (Le Cun et al., 2010), and few-shot-learning on CIFAR100 (Krizhevsky et al., 2009) as well as Omniglot (Lake et al., 2011). Full proofs are provided in Appendix E in the Supplement. and 5 Experiments We illustrate our theoretical findings on a synthetic experiment and then evaluate Adaptive UFOM on data hypercleaning and few-shot learning.
Researcher Affiliation Collaboration 1University of Cambridge 2Google Research, Brain Team 3Columbia University 4Deepmind 5Stanford University 6The Alan Turing Institute.
Pseudocode Yes Algorithm 1 Outer SGD., Algorithm 2 Inner GD (exact)., Algorithm 3 Inner GD (FOM)., Algorithm 4 Inner GD (UFOM).
Open Source Code No The paper does not provide any explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets Yes data hypercleaning on MNIST (Le Cun et al., 2010), and few-shot-learning on CIFAR100 (Krizhevsky et al., 2009) as well as Omniglot (Lake et al., 2011).
Dataset Splits Yes For that, we define θ R5000, |ΩT | = 1 and the inner loss Lin has the form Lin(θ, φ, T ) = P5000 i=1 σ(θ(i))l CCE(g(φ, Xi), Yi), where σ( ) is a sigmoid function, θ(i) is the ith element of θ and l CCE( , Y ) is a categorical cross entropy (CCE) with respect to a label Yi {0, . . . , 9}. Lout is defined as a cross entropy on the validation set. and To sample from p(T ) in the K-shot m-way setting, m classes are chosen randomly and K +1 examples are drawn from each class: K examples for training and 1 for testing, i.e. s = m K, t = m.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, or cloud computing specifications) used for running the experiments.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies, libraries, or programming languages used in the experiments.
Experiment Setup Yes As in (Shaban et al., 2018), we set r = 100 and α = 1. and We modify a setup of (Shaban et al., 2018) by using a twoinstead of one-layer feedforward network, with Re LU nonlinearity... and For Adaptive UFOM, on a validation score comparison we find that qmin = 0.05, β = 0.99 performs reasonably well. Further, we empirically find that Adaptive UFOM works best when (22) is modified so that D2 k = 0.1 D2 sm,k/(1 βkupd) and We reuse convolutional architectures for g(φ, X) from (Finn et al., 2017) and set inner-loop length to r = 10, as in (Nichol et al., 2018).