An Analytical Theory of Curriculum Learning in Teacher-Student Networks
Authors: Luca Saglietti, Stefano Mannelli, Andrew Saxe
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We analytically derive average learning trajectories for simple neural networks on this task, which establish a clear speed benefit for curriculum learning in the online setting. However, when training experiences can be stored and replayed the advantage of curriculum in standard neural networks disappears, in line with observations from the deep learning literature. ... We derive generalisation performance as a function of consolidation strength (implemented as an L2 regularisation/elastic coupling connecting learning phases), and show that curriculum-aware algorithms can yield a large improvement in test performance. ... To verify this prediction in a richer visual setting, we construct a simple cluttered object classification task from the CIFAR10 dataset [55] by patching two images together into a 32 64 input image (Fig. 7a). We train a single-layer network with the cross-entropy loss and the curriculum protocol with Gaussian prior between two curriculum stages, implemented in Pytorch Lightning to ensure that training parameters accord with standard practice. As shown in Fig. 7b, curriculum improves performance, particularly when easy examples make up a large proportion of the dataset, confirming that curricula that reduce clutter can benefit learning. |
| Researcher Affiliation | Collaboration | Department of Computing Sciences, Bocconi University. Gatsby Computational Neuroscience Unit & Sainsbury Wellcome Centre, University College London. FAIR, Meta AI Equal contributions. |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | No | The architecture used is very simple to simulate and we do not provide code. |
| Open Datasets | Yes | To verify this prediction in a richer visual setting, we construct a simple cluttered object classification task from the CIFAR10 dataset [55] by patching two images together into a 32 64 input image (Fig. 7a). ... We use and cite the CIFAR10 dataset. |
| Dataset Splits | No | Full parameters for the experiments on real data are given in the SM. |
| Hardware Specification | No | We report estimated total amount of compute and type of compute in the SM for the experiment on real data ( 10000 GPU hours, 1110 kg CO2 eq). |
| Software Dependencies | No | We train a single-layer network with the cross-entropy loss and the curriculum protocol with Gaussian prior between two curriculum stages, implemented in Pytorch Lightning to ensure that training parameters accord with standard practice. |
| Experiment Setup | Yes | We train a single-layer network with the cross-entropy loss and the curriculum protocol with Gaussian prior between two curriculum stages, implemented in Pytorch Lightning to ensure that training parameters accord with standard practice. We optimised hyperparameters in each curriculum phase separately. We trained all combinations of five elastic penalties log spaced between 1e 3 and 1e2, and weight decay parameters {0, .2, .5}. |