An Analytical Theory of Curriculum Learning in Teacher-Student Networks

Authors: Luca Saglietti, Stefano Mannelli, Andrew Saxe

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We analytically derive average learning trajectories for simple neural networks on this task, which establish a clear speed benefit for curriculum learning in the online setting. However, when training experiences can be stored and replayed the advantage of curriculum in standard neural networks disappears, in line with observations from the deep learning literature. ... We derive generalisation performance as a function of consolidation strength (implemented as an L2 regularisation/elastic coupling connecting learning phases), and show that curriculum-aware algorithms can yield a large improvement in test performance. ... To verify this prediction in a richer visual setting, we construct a simple cluttered object classification task from the CIFAR10 dataset [55] by patching two images together into a 32 64 input image (Fig. 7a). We train a single-layer network with the cross-entropy loss and the curriculum protocol with Gaussian prior between two curriculum stages, implemented in Pytorch Lightning to ensure that training parameters accord with standard practice. As shown in Fig. 7b, curriculum improves performance, particularly when easy examples make up a large proportion of the dataset, confirming that curricula that reduce clutter can benefit learning.
Researcher Affiliation Collaboration Department of Computing Sciences, Bocconi University. Gatsby Computational Neuroscience Unit & Sainsbury Wellcome Centre, University College London. FAIR, Meta AI Equal contributions.
Pseudocode No No pseudocode or algorithm blocks were found in the paper.
Open Source Code No The architecture used is very simple to simulate and we do not provide code.
Open Datasets Yes To verify this prediction in a richer visual setting, we construct a simple cluttered object classification task from the CIFAR10 dataset [55] by patching two images together into a 32 64 input image (Fig. 7a). ... We use and cite the CIFAR10 dataset.
Dataset Splits No Full parameters for the experiments on real data are given in the SM.
Hardware Specification No We report estimated total amount of compute and type of compute in the SM for the experiment on real data ( 10000 GPU hours, 1110 kg CO2 eq).
Software Dependencies No We train a single-layer network with the cross-entropy loss and the curriculum protocol with Gaussian prior between two curriculum stages, implemented in Pytorch Lightning to ensure that training parameters accord with standard practice.
Experiment Setup Yes We train a single-layer network with the cross-entropy loss and the curriculum protocol with Gaussian prior between two curriculum stages, implemented in Pytorch Lightning to ensure that training parameters accord with standard practice. We optimised hyperparameters in each curriculum phase separately. We trained all combinations of five elastic penalties log spaced between 1e 3 and 1e2, and weight decay parameters {0, .2, .5}.