reproducibilityindex.ai

An Analytical Theory of Curriculum Learning in Teacher-Student Networks

Authors: Luca Saglietti, Stefano Mannelli, Andrew Saxe

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We analytically derive average learning trajectories for simple neural networks on this task, which establish a clear speed beneﬁt for curriculum learning in the online setting. However, when training experiences can be stored and replayed the advantage of curriculum in standard neural networks disappears, in line with observations from the deep learning literature. ... We derive generalisation performance as a function of consolidation strength (implemented as an L2 regularisation/elastic coupling connecting learning phases), and show that curriculum-aware algorithms can yield a large improvement in test performance. ... To verify this prediction in a richer visual setting, we construct a simple cluttered object classiﬁcation task from the CIFAR10 dataset [55] by patching two images together into a 32 64 input image (Fig. 7a). We train a single-layer network with the cross-entropy loss and the curriculum protocol with Gaussian prior between two curriculum stages, implemented in Pytorch Lightning to ensure that training parameters accord with standard practice. As shown in Fig. 7b, curriculum improves performance, particularly when easy examples make up a large proportion of the dataset, conﬁrming that curricula that reduce clutter can beneﬁt learning.
Researcher Affiliation	Collaboration	Department of Computing Sciences, Bocconi University. Gatsby Computational Neuroscience Unit & Sainsbury Wellcome Centre, University College London. FAIR, Meta AI Equal contributions.
Pseudocode	No	No pseudocode or algorithm blocks were found in the paper.
Open Source Code	No	The architecture used is very simple to simulate and we do not provide code.
Open Datasets	Yes	To verify this prediction in a richer visual setting, we construct a simple cluttered object classiﬁcation task from the CIFAR10 dataset [55] by patching two images together into a 32 64 input image (Fig. 7a). ... We use and cite the CIFAR10 dataset.
Dataset Splits	No	Full parameters for the experiments on real data are given in the SM.
Hardware Specification	No	We report estimated total amount of compute and type of compute in the SM for the experiment on real data ( 10000 GPU hours, 1110 kg CO2 eq).
Software Dependencies	No	We train a single-layer network with the cross-entropy loss and the curriculum protocol with Gaussian prior between two curriculum stages, implemented in Pytorch Lightning to ensure that training parameters accord with standard practice.
Experiment Setup	Yes	We train a single-layer network with the cross-entropy loss and the curriculum protocol with Gaussian prior between two curriculum stages, implemented in Pytorch Lightning to ensure that training parameters accord with standard practice. We optimised hyperparameters in each curriculum phase separately. We trained all combinations of ﬁve elastic penalties log spaced between 1e 3 and 1e2, and weight decay parameters {0, .2, .5}.