Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Refining Minimax Regret for Unsupervised Environment Design
Authors: Michael Beukman, Samuel Coward, Michael Matthews, Mattie Fellows, Minqi Jiang, Michael D Dennis, Jakob Nicolaus Foerster
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we empirically demonstrate that the problems identified in Section 3 do occur, and that Re Mi Di alleviates these issues. First, in Section 6.2, we illustrate some of the failure cases of ideal UED in a simple tabular setting. Next, in Section 6.3, we experiment in the canonical Minigrid domain. In Section 6.4, we consider a different setting where regret-based UED results in a policy that performs poorly over a large subset of levels. Finally, we evaluate on a robotics task in Section 6.5. |
| Researcher Affiliation | Academia | 1University of Oxford 2University College London 3UC Berkeley. |
| Pseudocode | Yes | Algorithm 1 Refining Minimax Regret Distributions |
| Open Source Code | Yes | We publicly release our code at https://github.com/Michael-Beukman/Re Mi Di. |
| Open Datasets | Yes | We next consider Minigrid, a common benchmark in UED (Dennis et al., 2020; Jiang et al., 2021a; Parker-Holder et al., 2022). Our final experimental domain is robotics, using Brax (Todorov et al., 2012; Freeman et al., 2021). We evaluate the agent on a set of held-out standard test mazes used in prior work (Jiang et al., 2021a; Parker-Holder et al., 2022; Jiang et al., 2023). In particular, we use Sixteen Rooms, Sixteen Rooms2, Labyrinth, Labyrinth Flipped, Labyrinth2, Standard Maze, Standard Maze2, Standard Maze3, Small Corridor and Large Corridor. |
| Dataset Splits | No | The paper mentions 'a standard set of held-out mazes' for evaluation and running experiments with '10 seeds' but does not provide specific train/validation/test dataset splits, proportions, or methodologies for how the data was partitioned for training versus validation. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory, or specific cloud instances used for running experiments. |
| Software Dependencies | No | The paper mentions software components like 'Jax UED', 'Brax', 'PPO', and 'LSTM' but does not specify their version numbers or other specific ancillary software dependencies required for replication. |
| Experiment Setup | Yes | Appendix D.3 'Hyperparameter Tuning' and Table 9 'Hyperparameters' provide specific values for parameters such as 'PPO Number of Updates', 'γ', 'λGAE', 'PPO epochs', 'Adam learning rate', 'entropy coefficient', etc., for Minigrid, Lever, and Brax experiments. |