reproducibilityindex.ai

Phasic Policy Gradient

Authors: Karl W Cobbe, Jacob Hilton, Oleg Klimov, John Schulman

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We report results on the environments in Procgen Benchmark (Cobbe et al., 2019). This benchmark was designed to be highly diverse, and we expect improvements on this benchmark to transfer well to many other RL environments. In each Procgen environment, we train and evaluate agents on the full distribution of levels. Throughout all experiments, we use the hyperparameters found in Appendix A unless otherwise speciﬁed. When feasible, we compute and visualize the standard deviation across 3 separate runs.
Researcher Affiliation	Industry	1Open AI, San Francisco, CA, USA. Correspondence to: Karl Cobbe <karl@openai.com>.
Pseudocode	Yes	Algorithm 1 PPG
Open Source Code	Yes	Code for PPG can be found at https://github.com/openai/phasic-policy-gradient.
Open Datasets	Yes	We report results on the environments in Procgen Benchmark (Cobbe et al., 2019).
Dataset Splits	No	The paper states training and evaluating agents on the "full distribution of levels" in the Procgen environments but does not specify explicit numerical splits (e.g., 80/10/10) or sample counts for training, validation, or test sets.
Hardware Specification	No	The paper mentions "Each experiment required between 10 and 50 GPU-hours per run per environment," but does not specify the exact models of GPUs, CPUs, or other hardware components used.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies, such as programming languages, libraries, or frameworks used in the experiments.
Experiment Setup	Yes	Default values for all hyperparameters can be found in Appendix A. Code for PPG can be found at https://github.com/openai/phasic-policy-gradient.