Curriculum Learning by Dynamic Instance Hardness
Authors: Tianyi Zhou, Shengjie Wang, Jeffrey Bilmes
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On 11 datasets, DIHCL significantly outperforms random mini-batch SGD and recent CL methods in terms of efficiency and final performance. Empirically, we evaluate several variants of DIHCL and compare them against random mini-batch SGD as well as recent curriculum learning algorithms on 11 datasets. DIHCL shows an advantage over other baselines in terms both of time/sample efficiency and test set accuracy. (from Abstract and Section 1 Introduction) |
| Researcher Affiliation | Academia | Paul G. Allen School of Computer Science & Engineering1, Department of Electrical & Computer Engineering2, University of Washington, Seattle |
| Pseudocode | Yes | Algorithm 1 DIH Curriculum Learning (DIHCL-Greedy) |
| Open Source Code | Yes | The code of DIHCL is available at https://github.com/tianyizhou/DIHCL. |
| Open Datasets | Yes | We train different DNNs by using variants of DIHCL, and compare them with three baselines, vanilla random mini-batch SGD, self-paced learning (SPL) [25], and minimax curriculum learning (MCL) [52] on 11 image classification datasets (without pre-training), i.e., (A) Wide Res Net-28-10 [50] on CIFAR10 and CIFAR100 [24]; (B) Res Ne Xt50-32x4d [49] on Food-101 [6], FGVC Aircraft (Aircraft) [30], Stanford Cars [23], and Birdsnap [5]; (C) Res Net50 [14] on Image Net [11]; (D) Wide Res Net-16-8 on Fashion-MNIST (FMNIST) [48] and Kuzushiji-MNIST (KMNIST) [8]; (E) Pre Act Res Net34 [14] on STL10 [9] and SVHN [34]. |
| Dataset Splits | No | The paper frequently refers to 'training' and 'test' sets, for instance, 'The final test accuracy achieved by each method is reported in Table 1.' However, it does not explicitly provide details about specific training, validation, and test dataset splits (e.g., percentages or sample counts) for the reproduction of experiments. While standard splits are implied for commonly used datasets like CIFAR-10, the validation split is not specified. |
| Hardware Specification | No | No specific hardware details (such as exact GPU/CPU models, processor types, or memory amounts) used for running experiments were provided. The paper only states: 'Some GPUs used to produce the experimental results are donated by NVIDIA.' |
| Software Dependencies | No | No specific ancillary software details with version numbers (e.g., Python, PyTorch, CUDA versions) needed to replicate the experiment were provided. The paper only mentions general methods like 'random mini-batch SGD' and 'cyclic cosine annealing learning rate schedule'. |
| Experiment Setup | Yes | We use mini-batch SGD with a momentum of 0.9 and a cyclic cosine annealing learning rate schedule [29] (multiple epochs with starting/target learning rate decayed by a multiplicative factor 0.85). We use T0 = 5, γ = 0.95, γk = 0.85 for all DIHCL variants, and gradually reduce k from n to 0.2n. ... For DIHCL variants that further reduce St by solving Eq. (3), we use λ1 = 1.0, γλ = 0.8, γk0 = 0.4... (from Section 4 Experiments) |