Teaching with Commentaries

Authors: Aniruddh Raghu, Maithra Raghu, Simon Kornblith, David Duvenaud, Geoffrey Hinton

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We explore diverse applications of commentaries, from weighting training examples, to parameterising label-dependent data augmentation policies, to representing attention masks that highlight salient image regions. We find that commentaries can improve training speed and/or performance, and provide insights about the dataset and training process. We also observe that commentaries generalise: they can be reused when training new models to obtain performance benefits, suggesting a use-case where commentaries are stored with a dataset and leveraged in future for improved model training.
Researcher Affiliation Collaboration Aniruddh Raghu MIT araghu@mit.edu Maithra Raghu Google Research Simon Kornblith Google Research David Duvenaud Google Research & University of Toronto Geoffrey Hinton Google Research & University of Toronto
Pseudocode Yes Algorithm 1 Commentary Learning through Backpropagation Through Training. Algorithm 2 Commentary Learning through Implicit Differentiation.
Open Source Code Yes 1Code at https://github.com/googleinterns/commentaries
Open Datasets Yes We first learn example weight curriculum commentaries on a synthetic MNIST binary classification problem... We now learn example weighting curriculum commentaries on CIFAR10 and CIFAR100... We evaluate a standard MAML baseline and our commentary variant on standard few-shot learning benchmarks: (i) training/testing on Mini Image Net (MIN); and (ii) training on MIN and testing on CUB-200-2011 (CUB)... Augmentation Commentaries on MNIST... We learn commentary attention masks on a variety of image datasets: an MNIST variant, CIFAR10/100, medical chest X-rays, and Caltech-UCSD Birds (CUB)-200-2011...
Dataset Splits Yes Dataset: Both the overlapping and non-overlapping datasets are generated to have 10000 training examples, 5000 validation examples, and 5000 test examples.
Hardware Specification No The paper mentions "GPU memory constraints" but does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for experiments.
Software Dependencies No The paper mentions "higher library (Grefenstette et al., 2019)" but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes Training details: We train both networks using the Adam optimiser, with a learning rate of 1e-4 for the student, and 1e-3 for the commentary network. The student network is trained for 500 inner optimisation steps, with a batch size of 10. We train for 20 commentary network iterations.