Debiasing a First-order Heuristic for Approximate Bi-level Optimization
Authors: Valerii Likhosherstov, Xingyou Song, Krzysztof Choromanski, Jared Q Davis, Adrian Weller
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the utility of UFOM in a synthetic experiment, data hypercleaning on MNIST (Le Cun et al., 2010), and few-shot-learning on CIFAR100 (Krizhevsky et al., 2009) as well as Omniglot (Lake et al., 2011). Full proofs are provided in Appendix E in the Supplement. and 5 Experiments We illustrate our theoretical findings on a synthetic experiment and then evaluate Adaptive UFOM on data hypercleaning and few-shot learning. |
| Researcher Affiliation | Collaboration | 1University of Cambridge 2Google Research, Brain Team 3Columbia University 4Deepmind 5Stanford University 6The Alan Turing Institute. |
| Pseudocode | Yes | Algorithm 1 Outer SGD., Algorithm 2 Inner GD (exact)., Algorithm 3 Inner GD (FOM)., Algorithm 4 Inner GD (UFOM). |
| Open Source Code | No | The paper does not provide any explicit statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | data hypercleaning on MNIST (Le Cun et al., 2010), and few-shot-learning on CIFAR100 (Krizhevsky et al., 2009) as well as Omniglot (Lake et al., 2011). |
| Dataset Splits | Yes | For that, we define θ R5000, |ΩT | = 1 and the inner loss Lin has the form Lin(θ, φ, T ) = P5000 i=1 σ(θ(i))l CCE(g(φ, Xi), Yi), where σ( ) is a sigmoid function, θ(i) is the ith element of θ and l CCE( , Y ) is a categorical cross entropy (CCE) with respect to a label Yi {0, . . . , 9}. Lout is defined as a cross entropy on the validation set. and To sample from p(T ) in the K-shot m-way setting, m classes are chosen randomly and K +1 examples are drawn from each class: K examples for training and 1 for testing, i.e. s = m K, t = m. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, or cloud computing specifications) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies, libraries, or programming languages used in the experiments. |
| Experiment Setup | Yes | As in (Shaban et al., 2018), we set r = 100 and α = 1. and We modify a setup of (Shaban et al., 2018) by using a twoinstead of one-layer feedforward network, with Re LU nonlinearity... and For Adaptive UFOM, on a validation score comparison we find that qmin = 0.05, β = 0.99 performs reasonably well. Further, we empirically find that Adaptive UFOM works best when (22) is modified so that D2 k = 0.1 D2 sm,k/(1 βkupd) and We reuse convolutional architectures for g(φ, X) from (Finn et al., 2017) and set inner-loop length to r = 10, as in (Nichol et al., 2018). |