reproducibilityindex.ai

Learning to Execute: Efficient Learning of Universal Plan-Conditioned Policies in Robotics

Authors: Ingmar Schubert, Danny Driess, Ozgur S. Oguz, Marc Toussaint

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our robotic manipulation experiments, L2E exhibits increased performance when compared to pure RL, pure planning, or baseline methods combining learning and planning.
Researcher Affiliation	Academia	Ingmar Schubert1, Danny Driess1, Ozgur S. Oguz2, and Marc Toussaint1 1 Learning and Intelligent Systems Group, TU Berlin, Germany 2 Machine Learning and Robotics Lab, University of Stuttgart, Germany
Pseudocode	Yes	Algorithm 1: Learning to Execute (L2E)
Open Source Code	Yes	The complete code to fully reproduce the ﬁgures in this paper from scratch can be found at github.com/ischubert/l2e and in the supplementary material.
Open Datasets	No	The paper describes custom simulated environments (basic pushing and obstacle pushing) that use the Nvidia PhysX engine. While the code to reproduce these environments is open-source, it does not explicitly state the use of, or provide access information for, a pre-existing, publicly available dataset that is distinct from the simulation itself.
Dataset Splits	No	The paper describes experiments conducted in a simulated environment where data is generated dynamically. It does not provide specific training/validation/test dataset splits as it does not rely on a fixed, pre-existing dataset.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. It only mentions simulations.
Software Dependencies	No	The paper mentions software components like the Nvidia PhysX engine and cites Pytorch and Stable Baselines3 in the references, but it does not specify explicit version numbers for these or other software dependencies used in the experiments.
Experiment Setup	Yes	Both at training and evaluation time, we run episodes of length 250. For all experiments, we use the Soft Actor Critic (SAC) algorithm as implemented in stable-baselines3 (Rafﬁn et al., 2019). We use a discount factor of γ = 0.99 for all experiments. The parameters for the neural networks are described in section A.8. Neural network architectures use Relu activation functions, batch sizes of 256 and learning rates of 0.0003.