reproducibilityindex.ai

Implicit Curriculum in Procgen Made Explicit

Authors: Zhenxiong Tan, Kaixin Wang, Xinchao Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	For our experiments, we used C-Procgen, which faithfully simulates the same game logic and context distributions as the easy mode of the original Procgen benchmark. By leveraging the flexible context control features of C-Procgen, we recorded key metrics such as loss, entropy, episode length, average score, and the number of samples for each context. This approach provides a more detailed view of learning progress across different contexts. Specifically, we select nine environments from Procgen due to their episodic contexts, which change with each game reset, resulting in unique configurations for each playthrough. We utilize the Proximal Policy Optimization (PPO) algorithm (Schulman et al., 2017) in our reinforcement learning experiments. For each of these selected environments, we perform five individual runs, each encompassing 25 million steps, to ensure a comprehensive and robust analysis.
Researcher Affiliation	Academia	Zhenxiong Tan National University of Singapore zhenxiong@u.nus.edu Kaixin Wang National University of Singapore kaixin96.wang@gmail.com Xinchao Wang National University of Singapore xinchao@nus.edu.sg Equal Contributions Currently in Microsoft Research Asia, work done during his time in NUS. Corresponding Author.
Pseudocode	No	The paper does not include any pseudocode or clearly labeled algorithm blocks. It describes the methods verbally and uses mathematical formulas, but no structured algorithmic steps.
Open Source Code	Yes	The source code of C-Procgen can be found on Git Hub: https://github.com/zxtan98/CProcgen
Open Datasets	Yes	One popular procedurally generated environment suite is the Procgen benchmark (Cobbe et al., 2020), which consists of 16 challenging Atari-like video games.
Dataset Splits	No	The paper describes the experimental setup for training reinforcement learning agents on procedurally generated environments. It specifies hyper-parameters for the PPO algorithm (Table 1) and that experiments involve "25 million steps" over "five individual runs" but does not specify fixed training/validation/test dataset splits. In this type of RL setup, levels are continuously generated rather than using static dataset splits.
Hardware Specification	Yes	On a server with 2 Intel Xeon CPU cores and 56GB RAM, the FPS of Procgen is around 750 and the FPS of C-Procgen is around 710. Regarding the training cost, training a PPO agent on an NVIDIA T4 GPU for 25M steps takes approximately 2.5 3 hours.
Software Dependencies	No	The paper mentions using "PyTorch codebase from Raileanu and Fergus (2021)", "PPO Schulman et al. (2017)", and the "IMPALA network architecture Espeholt et al. (2018)". However, it does not provide specific version numbers for PyTorch or any other software libraries or dependencies, which is required for a reproducible description of ancillary software.
Experiment Setup	Yes	Unless otherwise stated, we follow the hyperparameters used in Cobbe et al. (2020) for the easy mode of Procgen, as summarized in Table 1. Table 1: Hyperparameters and their values HYPERPARAMETER VALUE γ 0.999 λ 0.95 # timesteps per rollout 256 # epochs per rollout 3 # minibatches per epoch 8 entropy bonus 0.01 clip range 0.2 reward normalization no learning rate 5e-4 # workers 1 # environments per worker 64 # total timesteps 25M optimizer Adam LSTM no frame stack no