Phasic Policy Gradient
Authors: Karl W Cobbe, Jacob Hilton, Oleg Klimov, John Schulman
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We report results on the environments in Procgen Benchmark (Cobbe et al., 2019). This benchmark was designed to be highly diverse, and we expect improvements on this benchmark to transfer well to many other RL environments. In each Procgen environment, we train and evaluate agents on the full distribution of levels. Throughout all experiments, we use the hyperparameters found in Appendix A unless otherwise speciļ¬ed. When feasible, we compute and visualize the standard deviation across 3 separate runs. |
| Researcher Affiliation | Industry | 1Open AI, San Francisco, CA, USA. Correspondence to: Karl Cobbe <karl@openai.com>. |
| Pseudocode | Yes | Algorithm 1 PPG |
| Open Source Code | Yes | Code for PPG can be found at https://github.com/openai/phasic-policy-gradient. |
| Open Datasets | Yes | We report results on the environments in Procgen Benchmark (Cobbe et al., 2019). |
| Dataset Splits | No | The paper states training and evaluating agents on the "full distribution of levels" in the Procgen environments but does not specify explicit numerical splits (e.g., 80/10/10) or sample counts for training, validation, or test sets. |
| Hardware Specification | No | The paper mentions "Each experiment required between 10 and 50 GPU-hours per run per environment," but does not specify the exact models of GPUs, CPUs, or other hardware components used. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies, such as programming languages, libraries, or frameworks used in the experiments. |
| Experiment Setup | Yes | Default values for all hyperparameters can be found in Appendix A. Code for PPG can be found at https://github.com/openai/phasic-policy-gradient. |