On The Power of Curriculum Learning in Training Deep Networks
Authors: Guy Hacohen, Daphna Weinshall
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our empirical evaluation, both methods show similar benefits in terms of increased learning speed and improved final performance on test data. We address challenge (ii) by investigating different pacing functions to guide the sampling. The empirical investigation includes a variety of network architectures, using images from CIFAR-10, CIFAR-100 and subsets of Image Net. We conclude with a novel theoretical analysis of curriculum learning, where we show how it effectively modifies the optimization landscape. |
| Researcher Affiliation | Academia | 1School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem 91904, Israel 2Edmond and Lily Safra Center for Brain Sciences, The Hebrew University of Jerusalem, Jerusalem 91904, Israel. |
| Pseudocode | Yes | Algorithm 1 Curriculum learning method |
| Open Source Code | Yes | All the code used in the paper is available at https://github.com/Guy Hacohen/curriculum learning |
| Open Datasets | Yes | The dataset is the small mammals super-class of CIFAR-100 (Krizhevsky & Hinton, 2009) a subset of 3000 images from CIFAR-100, divided into 5 classes. ... using images from CIFAR-10, CIFAR-100 and subsets of Image Net (Deng et al., 2009). |
| Dataset Splits | Yes | To avoid contamination of the conclusions, all results were cross-validated, wherein the hyper-parameters are chosen based on performance on a validation set before being used on the test set. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions models like Inception network, VGG-based network, and Res Net, but does not provide specific version numbers for software dependencies or libraries used in the implementation. |
| Experiment Setup | Yes | Hyper-parameter tuning. As in all empirical studies involving deep learning, the results are quite sensitive to the values of the hyper-parameters, hence parameter tuning is required. ... fixed exponential pacing, varied exponential pacing and single step pacing define 3, 5 and 2 new hyper-parameters respectively, henceforth called the pacing hyper-parameters. ... we set an initial learning rate and decrease it exponentially every fixed number of iterations. This method gives rise to 3 learning rate hyperparameters which require tuning: (i) the initial learning rate; (ii) the factor by which the learning rate is decreased; (iii) the length of each step with constant learning rate. |