Grounding Aleatoric Uncertainty for Unsupervised Environment Design
Authors: Minqi Jiang, Michael Dennis, Jack Parker-Holder, Andrei Lupu, Heinrich Küttler, Edward Grefenstette, Tim Rocktäschel, Jakob Foerster
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We prove, and validate on challenging domains, that our approach preserves optimality under the ground-truth distribution, while promoting robustness across the full range of environment settings. Our experiments first focus on a discrete, stochastic binary choice task, with which we validate our theoretical conclusions by demonstrating that CICS can indeed lead to suboptimal policies. |
| Researcher Affiliation | Collaboration | Minqi Jiang UCL & Meta AI Michael Dennis UC Berkeley Jack Parker-Holder University of Oxford Andrei Lupu MILA & Meta AI Heinrich Küttler Inflection AI Edward Grefenstette UCL & Cohere Tim Rocktäschel UCL Jakob Foerster FLAIR, U of Oxford |
| Pseudocode | Yes | Algorithm 1: Sample-Matched PLR (SAMPLR) |
| Open Source Code | Yes | The code reproducing the experimental results is included in the supplemental material. |
| Open Datasets | No | The paper describes custom environments built on existing frameworks (Mini Hack, Car Racing Bezier) but does not provide concrete access information (e.g., link, DOI, or specific dataset citation) for a publicly available dataset used for training. |
| Dataset Splits | No | All agents are trained using PPO [42] with the best hyperparameters found via grid search using a set of validation levels. However, no specific details about the size or percentage of this validation set are provided. |
| Hardware Specification | No | The paper mentions "compute estimates in Appendix C" but does not explicitly provide hardware specifications (e.g., GPU/CPU models, memory details) in the main text provided. |
| Software Dependencies | No | The paper mentions software components like PPO, Mini Hack, Net Hack Learning Environment, and Car Racing Bezier environment, but does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | We provide extended descriptions of both environments alongside the full details of our architecture and hyperparameter choices in Appendix C. |