Sample Efficient Path Integral Control under Uncertainty
Authors: Yunpeng Pan, Evangelos Theodorou, Michail Kontitsis
NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide experimental results on three different tasks and comparisons with state-of-the-art model-based methods to demonstrate the efficiency and generalizability of the proposed framework. |
| Researcher Affiliation | Academia | Yunpeng Pan, Evangelos A. Theodorou, and Michail Kontitsis Autonomous Control and Decision Systems Laboratory Institute for Robotics and Intelligent Machines School of Aerospace Engineering Georgia Institute of Technology, Atlanta, GA 30332 {ypan37,evangelos.theodorou,kontitsis}@gatech.edu |
| Pseudocode | Yes | Algorithm 1 Sample efficient path integral control under uncertain dynamics |
| Open Source Code | No | The paper does not provide any information or links regarding open-source code for the described methodology. |
| Open Datasets | Yes | We consider 3 simulated RL tasks: cart-pole (CP) swing up, double pendulum on a cart (DPC) swing up, and PUMA-560 robotic arm reaching. |
| Dataset Splits | No | The paper describes the number of sample rollouts used for initialization and trials, but it does not specify explicit training, validation, or test dataset splits in terms of percentages or counts for a fixed dataset, nor does it reference predefined splits for reproducibility. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running the experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries/solvers). |
| Experiment Setup | Yes | For both tasks we choose T = 1.2 and dt = 0.02 (60 time steps per rollout). The iterative PI [18] with a given dynamics model uses 103/104 (CP/DPC) sample rollouts per iteration and 500 iterations at each time step. We initialize PILCO and the proposed method by collecting 2/6 sample rollouts (corresponding to 120/360 transition samples) for CP/DPC tasks respectively. At each trial (on the true dynamics model), we use 1 sample rollout for PILCO and our method. PDDP uses 4/5 rollouts (corresponding to 240/300 transition samples) for initialization as well as at each trial for the CP/DPC tasks. ... For all tasks we initialize with 3 sample rollouts and 1 sample at each trial. |