Understanding the Complexity Gains of Single-Task RL with a Curriculum

Authors: Qiyang Li, Yuexiang Zhai, Yi Ma, Sergey Levine

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also show that our theoretical insights can be translated into an effective practical learning algorithm that can accelerate curriculum learning on simulated robotic tasks. Empirically, we verify our theory on a tabular MDP and provide a practical implementation of ROLLIN that can accelerate curriculum learning in the tabular environment and a range of simulated robotic tasks.
Researcher Affiliation Academia 1UC Berkeley. Correspondence to: Qiyang Li <qcli@berkeley.edu>, Yuxiang Zhai <simonzhai@berkeley.edu>.
Pseudocode Yes Algorithm 1 Provably Efficient Learning via ROLLIN; Algorithm 2 Practical Implementation of ROLLIN; Algorithm 3 PG for α-Max Ent RL; Algorithm 4 Two-Phase SPG for α-Max Ent RL; Algorithm 5 Random-horizon SPG for α-Max Ent RL Update; Algorithm 6 Sam SA: Sample s, a for SPG; Algorithm 7 Est Ent Q: Unbiased Estimation of Max Ent Q; Algorithm 8 Practical Implementation of ROLLIN
Open Source Code No The paper states: 'We use the SAC implementation from https://github.com/ikostrikov/jaxrl (Kostrikov, 2021) for all our experiments in the paper.' This is a reference to a third-party implementation used by the authors, not a statement that their own code (for ROLLIN or their specific experimental setup) is open-sourced by them.
Open Datasets Yes We adopt the antmaze-umaze environment (Fu et al., 2020) for evaluating the performance of ROLLIN in goalreaching tasks. For the non goal reaching tasks, we consider the tasks of gradually increasing the x-velocity of a locomotion agent in the following environments: walker2d, hopper, humanoid, and ant in Open AI gym (Brockman et al., 2016).
Dataset Splits No The paper refers to using specific environments like antmaze-umaze and Open AI gym tasks which often come with standard splits, but it does not explicitly state the train/validation/test splits used for their experiments (e.g., percentages or sample counts).
Hardware Specification No The paper mentions running 'simulated robotic tasks' and acknowledges support from the 'Savio computational cluster provided by the Berkeley Research Compute program.' However, it does not specify any particular hardware components like GPU or CPU models, or memory details.
Software Dependencies No The paper mentions 'SAC implementation from https://github.com/ikostrikov/jaxrl (Kostrikov, 2021)' and 'Adam optimizer (Kingma and Ba, 2015)' but does not provide version numbers for general software dependencies like Python or PyTorch, which are critical for reproducibility.
Experiment Setup Yes For our antmaze-umaze experiments with oracle curriculum, we use a sparse reward function where the reward is 0 when the distance D between the ant and the goal is greater than 0.5 and r = exp( 5D) when the distance is smaller than or equal to 0.5. The performance threshold is set to be R = 200. Initial Temperature 1.0; Target Update Rate update rate of target networks 0.005; Learning Rate learning rate for the Adam optimizer 0.0003; Discount Factor 0.99; Batch Size 256; Warmup Period number of steps of initial random exploration (random actions) 10000; Network Size (256, 256).