Phasic Policy Gradient

Authors: Karl W Cobbe, Jacob Hilton, Oleg Klimov, John Schulman

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We report results on the environments in Procgen Benchmark (Cobbe et al., 2019). This benchmark was designed to be highly diverse, and we expect improvements on this benchmark to transfer well to many other RL environments. In each Procgen environment, we train and evaluate agents on the full distribution of levels. Throughout all experiments, we use the hyperparameters found in Appendix A unless otherwise specified. When feasible, we compute and visualize the standard deviation across 3 separate runs.
Researcher Affiliation Industry 1Open AI, San Francisco, CA, USA. Correspondence to: Karl Cobbe <karl@openai.com>.
Pseudocode Yes Algorithm 1 PPG
Open Source Code Yes Code for PPG can be found at https://github.com/openai/phasic-policy-gradient.
Open Datasets Yes We report results on the environments in Procgen Benchmark (Cobbe et al., 2019).
Dataset Splits No The paper states training and evaluating agents on the "full distribution of levels" in the Procgen environments but does not specify explicit numerical splits (e.g., 80/10/10) or sample counts for training, validation, or test sets.
Hardware Specification No The paper mentions "Each experiment required between 10 and 50 GPU-hours per run per environment," but does not specify the exact models of GPUs, CPUs, or other hardware components used.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies, such as programming languages, libraries, or frameworks used in the experiments.
Experiment Setup Yes Default values for all hyperparameters can be found in Appendix A. Code for PPG can be found at https://github.com/openai/phasic-policy-gradient.