CurBench: Curriculum Learning Benchmark
Authors: Yuwei Zhou, Zirui Pan, Xin Wang, Hong Chen, Haoyang Li, Yanwen Huang, Zhixiao Xiong, Fangzhou Xiong, Peiyang Xu, Shengnan Liu, Wenwu Zhu
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Therefore, we develop Cur Bench, the first benchmark that supports systematic evaluations for curriculum learning. Specifically, it consists of 15 datasets spanning 3 research domains: computer vision, natural language processing, and graph machine learning, along with 3 settings: standard, noise, and imbalance. To facilitate a comprehensive comparison, we establish the evaluation from 2 dimensions: performance and complexity. Cur Bench also provides a unified toolkit that plugs automatic curricula into general machine learning processes, enabling the implementation of 15 core curriculum learning methods. On the basis of this benchmark, we conduct comparative experiments and make empirical analyses of existing methods. |
| Researcher Affiliation | Academia | Yuwei Zhou 1 Zirui Pan 1 Xin Wang 1 Hong Chen 1 Haoyang Li 1 Yanwen Huang 1 Zhixiao Xiong 1 Fangzhou Xiong 1 Peiyang Xu 1 Shengnan Liu 1 Wenwu Zhu 1 1Department of Computer Science and Technology, BNRIST, Tsinghua University. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. It describes modules and a pipeline but not in an algorithmic format. |
| Open Source Code | Yes | Cur Bench is open-source and publicly available at https://github.com/THUMNLab/Cur Bench. |
| Open Datasets | Yes | Table 1. The statistics of 15 datasets adopted in Cur Bench, which covers a wide range of scales across 3 research domains in 3 settings. Table 4. The home pages, download links, and licenses of datasets. All the datasets included in Cur Bench are publicly available for research. |
| Dataset Splits | Yes | For these 3 datasets, we split the original training set into a new training set and a validation set with a 9:1 ratio. For the other 3 datasets from TUDataset, we randomly divide the original datasets into training, validation, and test sets with an 8:1:1 ratio. |
| Hardware Specification | Yes | We record the training time and maximum memory consumption on the same GPU device as the indicators of the complexity. Table 8. Time and space complexity, measured by training time and GPU memory usage on NVIDIA V100 GPU. |
| Software Dependencies | No | The paper mentions software components like PyTorch, Hugging Face, and PyTorch Geometric, but it does not specify their version numbers, which is necessary for reproducible software dependencies. |
| Experiment Setup | Yes | To ensure a fair and reproducible evaluation, we fix all possible confounding factors and report the average and standard deviation results of 5 runs with different fixed random seeds for each combination of datasets, backbone models, and settings. The detailed hyperparameters for both training processes and curriculum learning methods are presented in the Appendix. Appendix E. Training Hyperparameters: Le Net, Res Net-18, Vi T: We choose a batch size of 50, and use an Adam optimizer to train the model with a constant learning rate of 0.0001 for 200 epochs. Appendix F. Method Hyperparameters: For a reproducible evaluation, we demonstrate the hyperparameters that we select for curriculum learning methods in Table 7. |