Predictor-Corrector Policy Optimization

Authors: Ching-An Cheng, Xinyan Yan, Nathan Ratliff, Byron Boots

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show, in both theory and simulation, that the convergence rate of several first-order model-free algorithms can be improved by PICCOLO. ... To validate the theory, we PICCOLO multiple algorithms in simulation. The experimental results show that the PICCOLOed versions consistently surpass the base algorithm and are robust to model errors.
Researcher Affiliation Collaboration Ching-An Cheng 1 2 Xinyan Yan 1 Nathan Ratliff 2 Byron Boots 1 2 1Georgia Tech 2NVIDIA. Correspondence to: Ching-An Cheng <cacheng@gatech.edu>.
Pseudocode Yes Algorithm 1 PICCOLO
Open Source Code Yes The codes are available at https://github.com/gtrll/rlfamily.
Open Datasets Yes robot RL tasks (Cart Pole, Hopper, Snake, and Walker3D) from Open AI Gym (Brockman et al., 2016) with the DART physics engine (Lee et al., 2018)
Dataset Splits No The paper mentions using OpenAI Gym environments but does not specify exact training, validation, or test dataset splits or percentages.
Hardware Specification No The paper does not provide specific details on the hardware used for running experiments, such as exact GPU or CPU models.
Software Dependencies No The paper mentions implementing the algorithm using PyTorch and OpenAI Gym, but it does not specify concrete version numbers for these software dependencies (e.g., 'PyTorch 1.9' or 'OpenAI Gym X.Y').
Experiment Setup Yes We implement our algorithm using PyTorch (Paszke etke., 2017) and OpenAI Gym (Brockman et al., 2016). For the Adam optimizer, we use the default parameters in PyTorch (β1 = 0.9, β2 = 0.999, = 10−8). ... ADAM learning rate 0.001. ... Each iteration of the algorithms involves collecting 2000 steps from 10 rollouts. ... For the TRPO and NATGRAD algorithms, we use the implementation from OpenAI Baselines (Dhariwal et al., 2017) with default hyperparameters.