Sample Efficient Path Integral Control under Uncertainty

Authors: Yunpeng Pan, Evangelos Theodorou, Michail Kontitsis

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide experimental results on three different tasks and comparisons with state-of-the-art model-based methods to demonstrate the efficiency and generalizability of the proposed framework.
Researcher Affiliation Academia Yunpeng Pan, Evangelos A. Theodorou, and Michail Kontitsis Autonomous Control and Decision Systems Laboratory Institute for Robotics and Intelligent Machines School of Aerospace Engineering Georgia Institute of Technology, Atlanta, GA 30332 {ypan37,evangelos.theodorou,kontitsis}@gatech.edu
Pseudocode Yes Algorithm 1 Sample efficient path integral control under uncertain dynamics
Open Source Code No The paper does not provide any information or links regarding open-source code for the described methodology.
Open Datasets Yes We consider 3 simulated RL tasks: cart-pole (CP) swing up, double pendulum on a cart (DPC) swing up, and PUMA-560 robotic arm reaching.
Dataset Splits No The paper describes the number of sample rollouts used for initialization and trials, but it does not specify explicit training, validation, or test dataset splits in terms of percentages or counts for a fixed dataset, nor does it reference predefined splits for reproducibility.
Hardware Specification No The paper does not provide any specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running the experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions or other libraries/solvers).
Experiment Setup Yes For both tasks we choose T = 1.2 and dt = 0.02 (60 time steps per rollout). The iterative PI [18] with a given dynamics model uses 103/104 (CP/DPC) sample rollouts per iteration and 500 iterations at each time step. We initialize PILCO and the proposed method by collecting 2/6 sample rollouts (corresponding to 120/360 transition samples) for CP/DPC tasks respectively. At each trial (on the true dynamics model), we use 1 sample rollout for PILCO and our method. PDDP uses 4/5 rollouts (corresponding to 240/300 transition samples) for initialization as well as at each trial for the CP/DPC tasks. ... For all tasks we initialize with 3 sample rollouts and 1 sample at each trial.