C-Planning: An Automatic Curriculum for Learning Goal-Reaching Tasks
Authors: Tianjun Zhang, Benjamin Eysenbach, Ruslan Salakhutdinov, Sergey Levine, Joseph E. Gonzalez
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we demonstrate that our method is more sample efficient that prior methods. Moreover, it is able to solve very long horizons manipulation and navigation tasks, tasks that prior goalconditioned methods and methods based on graph search fail to solve. 5 EXPERIMENTS Our experiments study whether C-Planning can compete with prior goal-conditioned RL methods both on benchmark tasks and on tasks designed to pose a significant planning and exploration challenge. |
| Researcher Affiliation | Academia | Tianjun Zhang UC Berkeley tianjunz@berkeley.edu Benjamin Eysenbach Carnegie Mellon University beysenba@cs.cmu.edu Ruslan Salakhutdinov Carnegie Mellon University Sergey Levine UC Berkeley Joseph E. Gonzalez UC Berkeley |
| Pseudocode | Yes | Algorithm 1 C-Planning performs planning in data collection, modifies C-learning by L5 L6 7. The update for the policy and classifier (L9) is the same. Algorithm 2 C-Planning samples the intermediate waypoints, then command the agent to reach them. |
| Open Source Code | Yes | Our code is available at https://github.com/tianjunz/c-planning. |
| Open Datasets | Yes | The first set of environments is taken from the Metaworld suite (Yu et al., 2020), a common benchmark for goal-conditioned RL. |
| Dataset Splits | No | The paper describes environments and tasks (Metaworld, 2D mazes) and discusses data collection during training, but does not specify explicit training, validation, and test dataset splits with percentages or counts. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud computing specifications used for running experiments. |
| Software Dependencies | No | The paper refers to using the SAC algorithm and building upon C-learning, but does not provide specific version numbers for any software libraries, frameworks, or dependencies used in the implementation. |
| Experiment Setup | Yes | We provide the essential hyperparameters for reproducing our experiments in this section. We also introduce the hyperparameters used in baselines and provide a detailed description of environmental design. Table 1: Hyperparameters used for C-Planning in all the environments in Meta World. Table 2: Hyperparameters used for C-Planning in all the environments in 2D navigation maze. |