reproducibilityindex.ai

Flexible and Efficient Long-Range Planning Through Curious Exploration

Authors: Aidan Curtis, Minjian Xin, Dilip Arumugam, Kevin Feigelis, Daniel Yamins

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that CSP can efﬁciently discover interesting and complex temporally-extended plans for solving a wide range of physically realistic 3D tasks. In contrast, standard planning and learning methods often fail to solve these tasks at all or do so only with a huge and highly variable number of training samples. We explore the use of a variety of curiosity metrics with CSP and analyze the types of solutions that CSP discovers. Finally, we show that CSP supports task transfer so that the exploration policies learned during experience with one task can help improve efﬁciency on related tasks.
Researcher Affiliation	Academia	1Rice University 2Shanghai Jiao Tong University 3Stanford University. Correspondence to: Aidan Curtis <curtisa@mit.edu>.
Pseudocode	Yes	Algorithm 1 The CSP algorithm. Input: Initial state s0, Goal set G, dynamics f Output: Path{(s0, a0, s1), ..., (sn 1, an 1, sn)} where sn G, f(si, ai) = si+1
Open Source Code	No	The paper mentions a link to a video ('https://youtu.be/7DSW8Dy9ADQ') but does not provide any explicit statement or link for the open-source code of their methodology.
Open Datasets	No	The paper describes a simulated environment using Bullet (Coumans, 2015) and pybullet-planning library (Garrett, 2018) for its experiments, but it does not specify any publicly available or open dataset used for training, nor does it provide a link, DOI, or formal citation for such a dataset.
Dataset Splits	No	The paper describes training processes for neural networks (e.g., action selection networks are trained using PPO) and mentions 'validation' in the context of network settings, but it does not provide explicit dataset splits (e.g., percentages or sample counts) for training, validation, and test sets. The experiments are conducted in a simulated environment rather than on a pre-defined static dataset with such splits.
Hardware Specification	No	The paper describes the simulated robot arm and environment (e.g., 'a mounted robot arm with seven bounded revolute joints'), but it does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used to run the experiments.
Software Dependencies	No	The paper mentions software components such as 'Bullet (Coumans, 2015)' and 'pybullet-planning library (Garrett, 2018),' and 'Proximal Policy Optimization (PPO) (Schulman et al., 2017),' but it does not provide specific version numbers for these software dependencies necessary for replication.
Experiment Setup	No	The paper provides some neural network architecture details, such as 'three-layer networks with 64 hidden units each, using the tanh activation function,' and mentions that 'The networks are trained using actor-critic reinforcement learning (namely, Proximal Policy Optimization (PPO) (Schulman et al., 2017)),' but it does not specify concrete hyperparameter values like learning rate, batch size, or number of epochs in the main text. It defers some 'Implementation details' to the supplement.