reproducibilityindex.ai

Teaching with Commentaries

Authors: Aniruddh Raghu, Maithra Raghu, Simon Kornblith, David Duvenaud, Geoffrey Hinton

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We explore diverse applications of commentaries, from weighting training examples, to parameterising label-dependent data augmentation policies, to representing attention masks that highlight salient image regions. We ﬁnd that commentaries can improve training speed and/or performance, and provide insights about the dataset and training process. We also observe that commentaries generalise: they can be reused when training new models to obtain performance beneﬁts, suggesting a use-case where commentaries are stored with a dataset and leveraged in future for improved model training.
Researcher Affiliation	Collaboration	Aniruddh Raghu MIT araghu@mit.edu Maithra Raghu Google Research Simon Kornblith Google Research David Duvenaud Google Research & University of Toronto Geoffrey Hinton Google Research & University of Toronto
Pseudocode	Yes	Algorithm 1 Commentary Learning through Backpropagation Through Training. Algorithm 2 Commentary Learning through Implicit Differentiation.
Open Source Code	Yes	1Code at https://github.com/googleinterns/commentaries
Open Datasets	Yes	We ﬁrst learn example weight curriculum commentaries on a synthetic MNIST binary classiﬁcation problem... We now learn example weighting curriculum commentaries on CIFAR10 and CIFAR100... We evaluate a standard MAML baseline and our commentary variant on standard few-shot learning benchmarks: (i) training/testing on Mini Image Net (MIN); and (ii) training on MIN and testing on CUB-200-2011 (CUB)... Augmentation Commentaries on MNIST... We learn commentary attention masks on a variety of image datasets: an MNIST variant, CIFAR10/100, medical chest X-rays, and Caltech-UCSD Birds (CUB)-200-2011...
Dataset Splits	Yes	Dataset: Both the overlapping and non-overlapping datasets are generated to have 10000 training examples, 5000 validation examples, and 5000 test examples.
Hardware Specification	No	The paper mentions "GPU memory constraints" but does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for experiments.
Software Dependencies	No	The paper mentions "higher library (Grefenstette et al., 2019)" but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	Training details: We train both networks using the Adam optimiser, with a learning rate of 1e-4 for the student, and 1e-3 for the commentary network. The student network is trained for 500 inner optimisation steps, with a batch size of 10. We train for 20 commentary network iterations.