Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Improving Regret Approximation for Unsupervised Dynamic Environment Generation
Authors: Harry Mead, Bruno Lacerda, Jakob Foerster, Nick Hawes
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show empirically that MNA outperforms current regret approximations and when combined with DEGen, consistently outperforms existing methods, especially as the size of the environment grows. We have made all our code available here: https://github. com/Harry MJMead/Dynamic-Environment-Generation-for-UED. 1 Introduction ... 5 Experimental Setup For this work, we examine the standard minigrid environment used in previous UED work [11, 25, 40], as well as evaluating UED performance on the modified minigrid with the addition of a key and locked door. |
| Researcher Affiliation | Academia | Harry Mead University of Oxford Bruno Lacerda University of Oxford Jakob Foerster University of Oxford Nick Hawes University of Oxford |
| Pseudocode | Yes | Algorithm 1 DEGen Initialise: student policy πϕ1, generator policy Λϕ2 while not converged do // Sample N trajectories for n 1 : N do Initialise empty level // Take T student steps for ts 1 : T do // Generate partial level Sample Λ actions to generate section of level that has been observed but not generated // Take student action Sample π action end for compute score using student trajectory τs assign reward to generator trajectory τg end for Update ϕ1 according to sampled student trajectories Update ϕ2 according to sampled generator trajectories end while |
| Open Source Code | Yes | We have made all our code available here: https://github. com/Harry MJMead/Dynamic-Environment-Generation-for-UED. |
| Open Datasets | Yes | For this work, we examine the standard minigrid environment used in previous UED work [11, 25, 40], as well as evaluating UED performance on the modified minigrid with the addition of a key and locked door. ... The minigrid levels were taken directly from Jax UED [10]. ... For the zero-shot transfer set, we have used the first 20 Sokoban Jr levels that do not exceed 13x13 in size. |
| Dataset Splits | Yes | For the standard minigrid, we use the set of 8 test levels used in previous work [10, 49]. For the modified key minigrid, we modify this set of levels so as to require the agent to unlock the door to reach the goal. ... Figures 9 and 10 show the hand designed levels used for evaluating zero-shot performance. ... For the zero-shot transfer set, we have used the first 20 Sokoban Jr levels that do not exceed 13x13 in size. |
| Hardware Specification | Yes | For all experiments, each run was on 1 Nvidia A40. |
| Software Dependencies | No | All existing methods were trained using implementations based on Jax UED [10], available at https://github.com/Drama Cow/jaxued, and SFL [49], available at https://github.com/amacrutherford/sampling-for-learnability. All student agents are trained using PPO [51], as well as the teacher agents used in DEGen and Initial Gen. |
| Experiment Setup | Yes | Detailed training hyperparameters for all domains and UED methods are found in Appendix B.2. Table 1: Learning Hyperparameters. ... Table 2: UED Hyperparameters. |