Prioritized Level Replay
Authors: Minqi Jiang, Edward Grefenstette, Tim Rocktäschel
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate PLR on several PCG environments with various combinations of scoring functions and prioritization schemes, and compare to the most common direct level sampling baseline of Ptrain(l|Λtrain) = Uniform(l; Λtrain). We train and test on all 16 environments in the Procgen Benchmark on easy and hard difficulties, but focus discussion on the easy results, which allow direct comparison to several prior studies. |
| Researcher Affiliation | Collaboration | 1Facebook AI Research, London, United Kingdom 2University College London, London, United Kingdom. Correspondence to: Minqi Jiang <msj@fb.com>. |
| Pseudocode | Yes | Algorithm 1 Policy-gradient training loop with PLR; Algorithm 2 Experience collection with PLR |
| Open Source Code | Yes | Our code is available at https://github.com/ facebookresearch/level-replay. |
| Open Datasets | Yes | We evaluate PLR on several PCG environments... We train and test on all 16 environments in the Procgen Benchmark... For Procgen, we use the same Res Block architecture as Cobbe et al. (2020a) and train for 25M total steps on 200 levels on the easy setting as in the original baselines. |
| Dataset Splits | No | The paper mentions training and testing but does not explicitly describe validation dataset splits. It evaluates performance on 'unseen test levels'. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'PPO with GAE for training' but does not specify versions for PPO, GAE, or any other software libraries or programming languages. |
| Experiment Setup | Yes | For Procgen, we use the same Res Block architecture as Cobbe et al. (2020a) and train for 25M total steps on 200 levels on the easy setting as in the original baselines. For Mini Grid, we use a 3-layer CNN architecture based on Igl et al. (2019), and provide approximately 1000 levels of each difficulty per environment during training. Detailed descriptions of the environments, architectures, and hyperparameters used in our experiments (and how they were set or obtained) can be found in Appendix A. |