When Do Curricula Work?
Authors: Xiaoxia Wu, Ethan Dyer, Behnam Neyshabur
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments over thousands of orderings spanning three kinds of learning: curriculum, anti-curriculum, and random-curriculum... Our experiments demonstrate that curriculum, but not anti-curriculum can indeed improve the performance either with limited training time budget or in existence of noisy data. |
| Researcher Affiliation | Collaboration | Xiaoxia Wu UChicago and TTIC xwu@ttic.edu Ethan Dyer Blueshift, Alphabet edyer@google.com Behnam Neyshabur Blueshift, Alphabet neyshabur@google.com |
| Pseudocode | Yes | Algorithm 1 (Random-/Anti-) Curriculum learning with pacing and scoring functions", "Algorithm 2 Loss function", "Algorithm 3 Learned Epoch", "Algorithm 4 Estimated c-score |
| Open Source Code | Yes | 1Code at https://github.com/google-research/understanding-curricula |
| Open Datasets | Yes | We train over 25,000 models over four datasets, CIFAR10/100, FOOD101, and FOOD101N" and "CIFAR10 (Krizhevsky & Hinton, 2009)", "FOOD101 (Bossard et al., 2014)", "FOOD101N (Lee et al., 2018) |
| Dataset Splits | Yes | For figures in Section 4 and 5, we use training samples 45000 and validation samples 5000. We look for the best test error of these 5000 validation samples and plot the corresponding test error/prediction. |
| Hardware Specification | Yes | We choose a batch size to be 128 and use one NVIDIA Tesla V100 GPU for each experiment." and "For FOOD101 and FOOD101N, we choose a batch size to be 256 and use NVIDIA Tesla 8 V100 GPU for each experiment. |
| Software Dependencies | No | The paper mentions 'Py Torch' and 'Caliban' but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | The data augmentation includes random horizontal flip and normalization, and the random training seeds are fixed to be {111, 222, 333}. We choose a batch size to be 128 and use one NVIDIA Tesla V100 GPU for each experiment. We use Caliban (Ritchie et al., 2020) and Google cloud AI platform to submit the jobs. The optimizer is SGD with 0.9 momentum, weight decay 5 10 5, and a learning rate scheduler - cosine decay with an initial value of 0.1." and "For FOOD101 and FOOD101N, we choose a batch size to be 256... The optimizer is SGD with 0.9 momentum, weight decay 1 10 5, and a learning rate scheduler cosine decay with an initial value of 0.1. |