Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Evolving Curricula with Regret-Based Environment Design

Authors: Jack Parker-Holder, Minqi Jiang, Michael Dennis, Mikayel Samvelyan, Jakob Foerster, Edward Grefenstette, Tim Rocktäschel

ICML 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our approach, which we call Adversarially Compounding Complexity by Editing Levels (ACCEL), seeks to constantly produce levels at the frontier of the student agent s capabilities, resulting in curricula that start simple but become increasingly complex. ACCEL maintains the theoretical beneﬁts of prior regret-based methods, while providing signiﬁcant empirical gains in a diverse set of environments. An interactive version of this paper is available at https://accelagent.github.io.
Researcher Affiliation	Collaboration	*Equal contribution 1Meta AI 2University of Oxford 3UCL 4UC Berkeley. Correspondence to: Jack Parker-Holder <EMAIL>, Minqi Jiang <EMAIL>.
Pseudocode	Yes	The full procedure is shown in Algorithm 1. ACCEL can be seen as a UED algorithm taking a step toward open-ended evolution (Stanley et al., 2017), where the evolutionary ﬁtness is estimated regret, as levels only stay in the population (that is, the level replay buffer) if they meet the high-regret criterion for curation.
Open Source Code	Yes	An open source implementation of ACCEL reproducing our experiments is available at https://github.com/facebookresearch/dcd.
Open Datasets	Yes	We begin with a partially-observable navigation environment, where we test our agents transfer capabilities on human-designed levels. For ACCEL we begin with empty rooms and randomly edit the block locations (by adding or removing blocks), as well as the goal location. The Mini Hack environment is an open-source Gym environment (Brockman et al., 2016), which wraps the game of Net Hack via the Net Hack Learning Environment (Kuttler et al., 2020).
Dataset Splits	Yes	For Mini Grid, we follow the protocol from Jiang et al. (2021a) and select the best hyperparameters using the validation levels {16Rooms, Labyrinth, Maze}. The ﬁnal hyperparameters chosen are shown in Table 11. We tuned the hyperparameters for our base agent using domain randomization, and conducted a sweep over the learning rate {3e-4, 3e-5}, PPO epochs {5, 20}, entropy coefﬁcient {0, 1e-3} and number of minibatches {4, 32}, using the validation performance on Bipedal Walker Hardcore.
Hardware Specification	Yes	All training runs used a single V100 GPU, using 10 Intel Xeon E5-2698 v4 CPUs.
Software Dependencies	No	The paper mentions using Python, Proximal Policy Optimization (PPO), Adam optimizer, Mini Grid, Mini Hack, and a modified Bipedal Walker environment. However, specific version numbers for these software components or libraries are not provided.
Experiment Setup	Yes	For a full list of hyperparameters for each experiment please see Table 11 in Section C.3. Table 11 provides detailed hyperparameters for PPO (e.g., PPO rollout length 256, PPO epochs 5, Adam learning rate 1e-4) and ACCEL/PLR specific settings (e.g., Buffer size 10000, Replay rate 0.9, Number of edits 5).