CurBench: Curriculum Learning Benchmark

Authors: Yuwei Zhou, Zirui Pan, Xin Wang, Hong Chen, Haoyang Li, Yanwen Huang, Zhixiao Xiong, Fangzhou Xiong, Peiyang Xu, Shengnan Liu, Wenwu Zhu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Therefore, we develop Cur Bench, the first benchmark that supports systematic evaluations for curriculum learning. Specifically, it consists of 15 datasets spanning 3 research domains: computer vision, natural language processing, and graph machine learning, along with 3 settings: standard, noise, and imbalance. To facilitate a comprehensive comparison, we establish the evaluation from 2 dimensions: performance and complexity. Cur Bench also provides a unified toolkit that plugs automatic curricula into general machine learning processes, enabling the implementation of 15 core curriculum learning methods. On the basis of this benchmark, we conduct comparative experiments and make empirical analyses of existing methods.
Researcher Affiliation Academia Yuwei Zhou 1 Zirui Pan 1 Xin Wang 1 Hong Chen 1 Haoyang Li 1 Yanwen Huang 1 Zhixiao Xiong 1 Fangzhou Xiong 1 Peiyang Xu 1 Shengnan Liu 1 Wenwu Zhu 1 1Department of Computer Science and Technology, BNRIST, Tsinghua University.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks. It describes modules and a pipeline but not in an algorithmic format.
Open Source Code Yes Cur Bench is open-source and publicly available at https://github.com/THUMNLab/Cur Bench.
Open Datasets Yes Table 1. The statistics of 15 datasets adopted in Cur Bench, which covers a wide range of scales across 3 research domains in 3 settings. Table 4. The home pages, download links, and licenses of datasets. All the datasets included in Cur Bench are publicly available for research.
Dataset Splits Yes For these 3 datasets, we split the original training set into a new training set and a validation set with a 9:1 ratio. For the other 3 datasets from TUDataset, we randomly divide the original datasets into training, validation, and test sets with an 8:1:1 ratio.
Hardware Specification Yes We record the training time and maximum memory consumption on the same GPU device as the indicators of the complexity. Table 8. Time and space complexity, measured by training time and GPU memory usage on NVIDIA V100 GPU.
Software Dependencies No The paper mentions software components like PyTorch, Hugging Face, and PyTorch Geometric, but it does not specify their version numbers, which is necessary for reproducible software dependencies.
Experiment Setup Yes To ensure a fair and reproducible evaluation, we fix all possible confounding factors and report the average and standard deviation results of 5 runs with different fixed random seeds for each combination of datasets, backbone models, and settings. The detailed hyperparameters for both training processes and curriculum learning methods are presented in the Appendix. Appendix E. Training Hyperparameters: Le Net, Res Net-18, Vi T: We choose a batch size of 50, and use an Adam optimizer to train the model with a constant learning rate of 0.0001 for 200 epochs. Appendix F. Method Hyperparameters: For a reproducible evaluation, we demonstrate the hyperparameters that we select for curriculum learning methods in Table 7.