Refining Minimax Regret for Unsupervised Environment Design
Authors: Michael Beukman, Samuel Coward, Michael Matthews, Mattie Fellows, Minqi Jiang, Michael D Dennis, Jakob Nicolaus Foerster
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we empirically demonstrate that the problems identified in Section 3 do occur, and that Re Mi Di alleviates these issues. First, in Section 6.2, we illustrate some of the failure cases of ideal UED in a simple tabular setting. Next, in Section 6.3, we experiment in the canonical Minigrid domain. In Section 6.4, we consider a different setting where regret-based UED results in a policy that performs poorly over a large subset of levels. Finally, we evaluate on a robotics task in Section 6.5. |
| Researcher Affiliation | Academia | 1University of Oxford 2University College London 3UC Berkeley. |
| Pseudocode | Yes | Algorithm 1 Refining Minimax Regret Distributions |
| Open Source Code | Yes | We publicly release our code at https://github.com/Michael-Beukman/Re Mi Di. |
| Open Datasets | Yes | We next consider Minigrid, a common benchmark in UED (Dennis et al., 2020; Jiang et al., 2021a; Parker-Holder et al., 2022). Our final experimental domain is robotics, using Brax (Todorov et al., 2012; Freeman et al., 2021). We evaluate the agent on a set of held-out standard test mazes used in prior work (Jiang et al., 2021a; Parker-Holder et al., 2022; Jiang et al., 2023). In particular, we use Sixteen Rooms, Sixteen Rooms2, Labyrinth, Labyrinth Flipped, Labyrinth2, Standard Maze, Standard Maze2, Standard Maze3, Small Corridor and Large Corridor. |
| Dataset Splits | No | The paper mentions 'a standard set of held-out mazes' for evaluation and running experiments with '10 seeds' but does not provide specific train/validation/test dataset splits, proportions, or methodologies for how the data was partitioned for training versus validation. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory, or specific cloud instances used for running experiments. |
| Software Dependencies | No | The paper mentions software components like 'Jax UED', 'Brax', 'PPO', and 'LSTM' but does not specify their version numbers or other specific ancillary software dependencies required for replication. |
| Experiment Setup | Yes | Appendix D.3 'Hyperparameter Tuning' and Table 9 'Hyperparameters' provide specific values for parameters such as 'PPO Number of Updates', 'γ', 'λGAE', 'PPO epochs', 'Adam learning rate', 'entropy coefficient', etc., for Minigrid, Lever, and Brax experiments. |