Implicit Curriculum in Procgen Made Explicit
Authors: Zhenxiong Tan, Kaixin Wang, Xinchao Wang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | For our experiments, we used C-Procgen, which faithfully simulates the same game logic and context distributions as the easy mode of the original Procgen benchmark. By leveraging the flexible context control features of C-Procgen, we recorded key metrics such as loss, entropy, episode length, average score, and the number of samples for each context. This approach provides a more detailed view of learning progress across different contexts. Specifically, we select nine environments from Procgen due to their episodic contexts, which change with each game reset, resulting in unique configurations for each playthrough. We utilize the Proximal Policy Optimization (PPO) algorithm (Schulman et al., 2017) in our reinforcement learning experiments. For each of these selected environments, we perform five individual runs, each encompassing 25 million steps, to ensure a comprehensive and robust analysis. |
| Researcher Affiliation | Academia | Zhenxiong Tan National University of Singapore zhenxiong@u.nus.edu Kaixin Wang National University of Singapore kaixin96.wang@gmail.com Xinchao Wang National University of Singapore xinchao@nus.edu.sg Equal Contributions Currently in Microsoft Research Asia, work done during his time in NUS. Corresponding Author. |
| Pseudocode | No | The paper does not include any pseudocode or clearly labeled algorithm blocks. It describes the methods verbally and uses mathematical formulas, but no structured algorithmic steps. |
| Open Source Code | Yes | The source code of C-Procgen can be found on Git Hub: https://github.com/zxtan98/CProcgen |
| Open Datasets | Yes | One popular procedurally generated environment suite is the Procgen benchmark (Cobbe et al., 2020), which consists of 16 challenging Atari-like video games. |
| Dataset Splits | No | The paper describes the experimental setup for training reinforcement learning agents on procedurally generated environments. It specifies hyper-parameters for the PPO algorithm (Table 1) and that experiments involve "25 million steps" over "five individual runs" but does not specify fixed training/validation/test dataset splits. In this type of RL setup, levels are continuously generated rather than using static dataset splits. |
| Hardware Specification | Yes | On a server with 2 Intel Xeon CPU cores and 56GB RAM, the FPS of Procgen is around 750 and the FPS of C-Procgen is around 710. Regarding the training cost, training a PPO agent on an NVIDIA T4 GPU for 25M steps takes approximately 2.5 3 hours. |
| Software Dependencies | No | The paper mentions using "PyTorch codebase from Raileanu and Fergus (2021)", "PPO Schulman et al. (2017)", and the "IMPALA network architecture Espeholt et al. (2018)". However, it does not provide specific version numbers for PyTorch or any other software libraries or dependencies, which is required for a reproducible description of ancillary software. |
| Experiment Setup | Yes | Unless otherwise stated, we follow the hyperparameters used in Cobbe et al. (2020) for the easy mode of Procgen, as summarized in Table 1. Table 1: Hyperparameters and their values HYPERPARAMETER VALUE γ 0.999 λ 0.95 # timesteps per rollout 256 # epochs per rollout 3 # minibatches per epoch 8 entropy bonus 0.01 clip range 0.2 reward normalization no learning rate 5e-4 # workers 1 # environments per worker 64 # total timesteps 25M optimizer Adam LSTM no frame stack no |