Flexible and Efficient Long-Range Planning Through Curious Exploration
Authors: Aidan Curtis, Minjian Xin, Dilip Arumugam, Kevin Feigelis, Daniel Yamins
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that CSP can efficiently discover interesting and complex temporally-extended plans for solving a wide range of physically realistic 3D tasks. In contrast, standard planning and learning methods often fail to solve these tasks at all or do so only with a huge and highly variable number of training samples. We explore the use of a variety of curiosity metrics with CSP and analyze the types of solutions that CSP discovers. Finally, we show that CSP supports task transfer so that the exploration policies learned during experience with one task can help improve efficiency on related tasks. |
| Researcher Affiliation | Academia | 1Rice University 2Shanghai Jiao Tong University 3Stanford University. Correspondence to: Aidan Curtis <curtisa@mit.edu>. |
| Pseudocode | Yes | Algorithm 1 The CSP algorithm. Input: Initial state s0, Goal set G, dynamics f Output: Path{(s0, a0, s1), ..., (sn 1, an 1, sn)} where sn G, f(si, ai) = si+1 |
| Open Source Code | No | The paper mentions a link to a video ('https://youtu.be/7DSW8Dy9ADQ') but does not provide any explicit statement or link for the open-source code of their methodology. |
| Open Datasets | No | The paper describes a simulated environment using Bullet (Coumans, 2015) and pybullet-planning library (Garrett, 2018) for its experiments, but it does not specify any publicly available or open dataset used for training, nor does it provide a link, DOI, or formal citation for such a dataset. |
| Dataset Splits | No | The paper describes training processes for neural networks (e.g., action selection networks are trained using PPO) and mentions 'validation' in the context of network settings, but it does not provide explicit dataset splits (e.g., percentages or sample counts) for training, validation, and test sets. The experiments are conducted in a simulated environment rather than on a pre-defined static dataset with such splits. |
| Hardware Specification | No | The paper describes the simulated robot arm and environment (e.g., 'a mounted robot arm with seven bounded revolute joints'), but it does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used to run the experiments. |
| Software Dependencies | No | The paper mentions software components such as 'Bullet (Coumans, 2015)' and 'pybullet-planning library (Garrett, 2018),' and 'Proximal Policy Optimization (PPO) (Schulman et al., 2017),' but it does not provide specific version numbers for these software dependencies necessary for replication. |
| Experiment Setup | No | The paper provides some neural network architecture details, such as 'three-layer networks with 64 hidden units each, using the tanh activation function,' and mentions that 'The networks are trained using actor-critic reinforcement learning (namely, Proximal Policy Optimization (PPO) (Schulman et al., 2017)),' but it does not specify concrete hyperparameter values like learning rate, batch size, or number of epochs in the main text. It defers some 'Implementation details' to the supplement. |