Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Teaching with Commentaries
Authors: Aniruddh Raghu, Maithra Raghu, Simon Kornblith, David Duvenaud, Geoffrey Hinton
ICLR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We explore diverse applications of commentaries, from weighting training examples, to parameterising label-dependent data augmentation policies, to representing attention masks that highlight salient image regions. We find that commentaries can improve training speed and/or performance, and provide insights about the dataset and training process. We also observe that commentaries generalise: they can be reused when training new models to obtain performance benefits, suggesting a use-case where commentaries are stored with a dataset and leveraged in future for improved model training. |
| Researcher Affiliation | Collaboration | Aniruddh Raghu MIT EMAIL Maithra Raghu Google Research Simon Kornblith Google Research David Duvenaud Google Research & University of Toronto Geoffrey Hinton Google Research & University of Toronto |
| Pseudocode | Yes | Algorithm 1 Commentary Learning through Backpropagation Through Training. Algorithm 2 Commentary Learning through Implicit Differentiation. |
| Open Source Code | Yes | 1Code at https://github.com/googleinterns/commentaries |
| Open Datasets | Yes | We first learn example weight curriculum commentaries on a synthetic MNIST binary classification problem... We now learn example weighting curriculum commentaries on CIFAR10 and CIFAR100... We evaluate a standard MAML baseline and our commentary variant on standard few-shot learning benchmarks: (i) training/testing on Mini Image Net (MIN); and (ii) training on MIN and testing on CUB-200-2011 (CUB)... Augmentation Commentaries on MNIST... We learn commentary attention masks on a variety of image datasets: an MNIST variant, CIFAR10/100, medical chest X-rays, and Caltech-UCSD Birds (CUB)-200-2011... |
| Dataset Splits | Yes | Dataset: Both the overlapping and non-overlapping datasets are generated to have 10000 training examples, 5000 validation examples, and 5000 test examples. |
| Hardware Specification | No | The paper mentions "GPU memory constraints" but does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for experiments. |
| Software Dependencies | No | The paper mentions "higher library (Grefenstette et al., 2019)" but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | Training details: We train both networks using the Adam optimiser, with a learning rate of 1e-4 for the student, and 1e-3 for the commentary network. The student network is trained for 500 inner optimisation steps, with a batch size of 10. We train for 20 commentary network iterations. |