reproducibilityindex.ai

When Do Curricula Work?

Authors: Xiaoxia Wu, Ethan Dyer, Behnam Neyshabur

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments over thousands of orderings spanning three kinds of learning: curriculum, anti-curriculum, and random-curriculum... Our experiments demonstrate that curriculum, but not anti-curriculum can indeed improve the performance either with limited training time budget or in existence of noisy data.
Researcher Affiliation	Collaboration	Xiaoxia Wu UChicago and TTIC xwu@ttic.edu Ethan Dyer Blueshift, Alphabet edyer@google.com Behnam Neyshabur Blueshift, Alphabet neyshabur@google.com
Pseudocode	Yes	Algorithm 1 (Random-/Anti-) Curriculum learning with pacing and scoring functions", "Algorithm 2 Loss function", "Algorithm 3 Learned Epoch", "Algorithm 4 Estimated c-score
Open Source Code	Yes	1Code at https://github.com/google-research/understanding-curricula
Open Datasets	Yes	We train over 25,000 models over four datasets, CIFAR10/100, FOOD101, and FOOD101N" and "CIFAR10 (Krizhevsky & Hinton, 2009)", "FOOD101 (Bossard et al., 2014)", "FOOD101N (Lee et al., 2018)
Dataset Splits	Yes	For ﬁgures in Section 4 and 5, we use training samples 45000 and validation samples 5000. We look for the best test error of these 5000 validation samples and plot the corresponding test error/prediction.
Hardware Specification	Yes	We choose a batch size to be 128 and use one NVIDIA Tesla V100 GPU for each experiment." and "For FOOD101 and FOOD101N, we choose a batch size to be 256 and use NVIDIA Tesla 8 V100 GPU for each experiment.
Software Dependencies	No	The paper mentions 'Py Torch' and 'Caliban' but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	The data augmentation includes random horizontal ﬂip and normalization, and the random training seeds are ﬁxed to be {111, 222, 333}. We choose a batch size to be 128 and use one NVIDIA Tesla V100 GPU for each experiment. We use Caliban (Ritchie et al., 2020) and Google cloud AI platform to submit the jobs. The optimizer is SGD with 0.9 momentum, weight decay 5 10 5, and a learning rate scheduler - cosine decay with an initial value of 0.1." and "For FOOD101 and FOOD101N, we choose a batch size to be 256... The optimizer is SGD with 0.9 momentum, weight decay 1 10 5, and a learning rate scheduler cosine decay with an initial value of 0.1.