Learning to Execute: Efficient Learning of Universal Plan-Conditioned Policies in Robotics
Authors: Ingmar Schubert, Danny Driess, Ozgur S. Oguz, Marc Toussaint
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our robotic manipulation experiments, L2E exhibits increased performance when compared to pure RL, pure planning, or baseline methods combining learning and planning. |
| Researcher Affiliation | Academia | Ingmar Schubert1, Danny Driess1, Ozgur S. Oguz2, and Marc Toussaint1 1 Learning and Intelligent Systems Group, TU Berlin, Germany 2 Machine Learning and Robotics Lab, University of Stuttgart, Germany |
| Pseudocode | Yes | Algorithm 1: Learning to Execute (L2E) |
| Open Source Code | Yes | The complete code to fully reproduce the figures in this paper from scratch can be found at github.com/ischubert/l2e and in the supplementary material. |
| Open Datasets | No | The paper describes custom simulated environments (basic pushing and obstacle pushing) that use the Nvidia PhysX engine. While the code to reproduce these environments is open-source, it does not explicitly state the use of, or provide access information for, a pre-existing, publicly available dataset that is distinct from the simulation itself. |
| Dataset Splits | No | The paper describes experiments conducted in a simulated environment where data is generated dynamically. It does not provide specific training/validation/test dataset splits as it does not rely on a fixed, pre-existing dataset. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. It only mentions simulations. |
| Software Dependencies | No | The paper mentions software components like the Nvidia PhysX engine and cites Pytorch and Stable Baselines3 in the references, but it does not specify explicit version numbers for these or other software dependencies used in the experiments. |
| Experiment Setup | Yes | Both at training and evaluation time, we run episodes of length 250. For all experiments, we use the Soft Actor Critic (SAC) algorithm as implemented in stable-baselines3 (Raffin et al., 2019). We use a discount factor of γ = 0.99 for all experiments. The parameters for the neural networks are described in section A.8. Neural network architectures use Relu activation functions, batch sizes of 256 and learning rates of 0.0003. |