Teaching with Commentaries
Authors: Aniruddh Raghu, Maithra Raghu, Simon Kornblith, David Duvenaud, Geoffrey Hinton
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We explore diverse applications of commentaries, from weighting training examples, to parameterising label-dependent data augmentation policies, to representing attention masks that highlight salient image regions. We find that commentaries can improve training speed and/or performance, and provide insights about the dataset and training process. We also observe that commentaries generalise: they can be reused when training new models to obtain performance benefits, suggesting a use-case where commentaries are stored with a dataset and leveraged in future for improved model training. |
| Researcher Affiliation | Collaboration | Aniruddh Raghu MIT araghu@mit.edu Maithra Raghu Google Research Simon Kornblith Google Research David Duvenaud Google Research & University of Toronto Geoffrey Hinton Google Research & University of Toronto |
| Pseudocode | Yes | Algorithm 1 Commentary Learning through Backpropagation Through Training. Algorithm 2 Commentary Learning through Implicit Differentiation. |
| Open Source Code | Yes | 1Code at https://github.com/googleinterns/commentaries |
| Open Datasets | Yes | We first learn example weight curriculum commentaries on a synthetic MNIST binary classification problem... We now learn example weighting curriculum commentaries on CIFAR10 and CIFAR100... We evaluate a standard MAML baseline and our commentary variant on standard few-shot learning benchmarks: (i) training/testing on Mini Image Net (MIN); and (ii) training on MIN and testing on CUB-200-2011 (CUB)... Augmentation Commentaries on MNIST... We learn commentary attention masks on a variety of image datasets: an MNIST variant, CIFAR10/100, medical chest X-rays, and Caltech-UCSD Birds (CUB)-200-2011... |
| Dataset Splits | Yes | Dataset: Both the overlapping and non-overlapping datasets are generated to have 10000 training examples, 5000 validation examples, and 5000 test examples. |
| Hardware Specification | No | The paper mentions "GPU memory constraints" but does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for experiments. |
| Software Dependencies | No | The paper mentions "higher library (Grefenstette et al., 2019)" but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | Training details: We train both networks using the Adam optimiser, with a learning rate of 1e-4 for the student, and 1e-3 for the commentary network. The student network is trained for 500 inner optimisation steps, with a batch size of 10. We train for 20 commentary network iterations. |