Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Online Hyperparameter Meta-Learning with Hypergradient Distillation
Authors: Hae Beom Lee, Hayeon Lee, JaeWoong Shin, Eunho Yang, Timothy Hospedales, Sung Ju Hwang
ICLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of our method on two different meta-learning methods and three benchmark datasets. |
| Researcher Affiliation | Collaboration | KAIST1, AITRICS2, Lunit3, South Korea, University of Edinburgh4, Samsung AI Centre, Cambridge5, United Kingdom |
| Pseudocode | Yes | Algorithm 1 Reverse-HG (RMD)... Algorithm 2 Dr MAD (Fu et al., 2016)... Algorithm 3 Hyper Distill... Algorithm 4 Linear Estimation(γ, λ, φ) |
| Open Source Code | Yes | Code is publicly available at: https://github.com/haebeom-lee/hyperdistill |
| Open Datasets | Yes | 1) Tiny Image Net. (Le & Yang, 2015) This dataset contains 200 classes of general categories... 2) CIFAR100. (Krizhevsky et al., 2009) This dataset contains 100 classes of general categories. |
| Dataset Splits | Yes | 1) Tiny Image Net. (Le & Yang, 2015)... We split them into 100, 40, and 60 classes for meta-training, meta-validation, and meta-test. 2) CIFAR100. (Krizhevsky et al., 2009)... We split them into 50, 20, and 30 classes for meta-training, meta-validation, and meta-test. |
| Hardware Specification | Yes | We used RTX 2080 Ti for the measurements. |
| Software Dependencies | No | The paper mentions optimizers like SGD and Adam (Kingma & Ba, 2015) but does not specify versions for core software libraries or frameworks (e.g., PyTorch, TensorFlow, Python version). |
| Experiment Setup | Yes | Meta-training: For inner-optimization of the weights, we use SGD with momentum 0.9 and set the learning rate µInner = 0.1 for Meta Weight Net and µInner = 0.01 for the others. The number of inner-steps is T = 100 and batchsize is 100. We use random cropping and horizontal flipping as data augmentations. For the hyperparameter optimization, we also use SGD with momentum 0.9 with learning rate µHyper = 0.01 for Meta Weight Net and µHyper = 0.001 for the others, which we linearly decay toward 0 over total M = 1000 inner-optimizations. We perform parallel meta-learning with meta-batchsize set to 4. |