On the Statistical Benefits of Curriculum Learning

Authors: Ziping Xu, Ambuj Tewari

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we study the benefits of CL in the multitask linear regression problem under both structured and unstructured settings. For both settings, we derive the minimax rates for CL with the oracle that provides the optimal curriculum and without the oracle, where the agent has to adaptively learn a good curriculum. Our results reveal that adaptive learning can be fundamentally harder than the oracle learning in the unstructured setting, but it merely introduces a small extra term in the structured setting. To connect theory with practice, we provide justification for a popular empirical method that selects tasks with highest local prediction gain by comparing its guarantees with the minimax rates mentioned above.To compliment the theoretical analyses, we conduct simulations studies by applying actual SGD with tasks chosen to maximize the local prediction gain.
Researcher Affiliation Academia 1Department of Statistics, University of Michigan, Ann Arbor.
Pseudocode Yes Algorithm 1 CL by optimistic scheduling
Open Source Code No The paper does not provide any statement or link indicating that its source code is publicly available.
Open Datasets No The paper describes how synthetic data was generated for simulations: "The true parameters of all the tasks are sampled from N(0, 0.001Id). On expectation, the transfer distance 2 t,T between task t and the target task is about 0.01d. The input x s are sampled from the same distribution N(0, Id) for all the tasks." This is not a publicly accessible dataset with a specific name or link.
Dataset Splits No The paper mentions "total number of observations n" for simulations but does not specify explicit train/validation/test dataset splits (e.g., percentages or counts) or refer to standard predefined splits.
Hardware Specification No The paper describes simulation studies but does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run these simulations.
Software Dependencies No The paper does not mention any specific software or library names with version numbers that would be required to reproduce the experiments.
Experiment Setup Yes We set T = 5 and σ2 t = 0.001, 0.01, 0.1, 1, 1 for t = 1, . . . , 5, respectively. Note that the 5-th task is the target task. We test the effects of total number of observations n = 10, 50, 100, 500, 1000 and the effects of dimension d = 5, 10, 50, 100. By default, we set n = 1000 and d = 5. In our experiment, η = 0.85.