Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Predictor-Corrector Policy Optimization
Authors: Ching-An Cheng, Xinyan Yan, Nathan Ratliff, Byron Boots
ICML 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show, in both theory and simulation, that the convergence rate of several ο¬rst-order model-free algorithms can be improved by PICCOLO. ... To validate the theory, we PICCOLO multiple algorithms in simulation. The experimental results show that the PICCOLOed versions consistently surpass the base algorithm and are robust to model errors. |
| Researcher Affiliation | Collaboration | Ching-An Cheng 1 2 Xinyan Yan 1 Nathan Ratliff 2 Byron Boots 1 2 1Georgia Tech 2NVIDIA. Correspondence to: Ching-An Cheng <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 PICCOLO |
| Open Source Code | Yes | The codes are available at https://github.com/gtrll/rlfamily. |
| Open Datasets | Yes | robot RL tasks (Cart Pole, Hopper, Snake, and Walker3D) from Open AI Gym (Brockman et al., 2016) with the DART physics engine (Lee et al., 2018) |
| Dataset Splits | No | The paper mentions using OpenAI Gym environments but does not specify exact training, validation, or test dataset splits or percentages. |
| Hardware Specification | No | The paper does not provide specific details on the hardware used for running experiments, such as exact GPU or CPU models. |
| Software Dependencies | No | The paper mentions implementing the algorithm using PyTorch and OpenAI Gym, but it does not specify concrete version numbers for these software dependencies (e.g., 'PyTorch 1.9' or 'OpenAI Gym X.Y'). |
| Experiment Setup | Yes | We implement our algorithm using PyTorch (Paszke etke., 2017) and OpenAI Gym (Brockman et al., 2016). For the Adam optimizer, we use the default parameters in PyTorch (Ξ²1 = 0.9, Ξ²2 = 0.999, = 10β8). ... ADAM learning rate 0.001. ... Each iteration of the algorithms involves collecting 2000 steps from 10 rollouts. ... For the TRPO and NATGRAD algorithms, we use the implementation from OpenAI Baselines (Dhariwal et al., 2017) with default hyperparameters. |