Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design
Authors: Michael Dennis, Natasha Jaques, Eugene Vinitsky, Alexandre Bayen, Stuart Russell, Andrew Critch, Sergey Levine
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate that PAIRED produces a natural curriculum of increasingly complex environments, and PAIRED agents achieve higher zero-shot transfer performance when tested in highly novel environments. 5 Experiments |
| Researcher Affiliation | Collaboration | University of California Berkeley AI Research (BAIR), Berkeley, CA, 94704 2Google Research, Brain team, Mountain View, CA, 94043 |
| Pseudocode | Yes | Algorithm 1: PAIRED. |
| Open Source Code | Yes | The code for PAIRED and our experiments is available in open source at https://github.com/google-research/ google-research/tree/master/social_rl/. |
| Open Datasets | Yes | Here we investigate navigation tasks (based on [9]), in which an agent must explore to find a goal (green square in Figure 1) while navigating around obstacles. [9] Maxime Chevalier-Boisvert, Lucas Willems, and Suman Pal. Minimalistic gridworld environment for openai gym. https://github.com/maximecb/gym-minigrid, 2018. To compare more closely with prior work on minimax adversarial RL [28, 43], we construct an additional experiment in a modified version of the Mu Jo Co hopper domain [42]. [42] Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026 5033. IEEE, 2012. |
| Dataset Splits | No | Parameters for the emergent complexity task are selected to maximize the solved path length, and parameters for the transfer task are selected using a set of validation environments. While validation environments are mentioned, no specific dataset split information (percentages, counts, or explicit methodology for fixed datasets) is provided. The environments are *generated* rather than split from a pre-existing dataset. |
| Hardware Specification | No | The paper mentions 'funding computation expenses associated with this work' but does not specify any hardware details such as GPU/CPU models or specific computing resources used for experiments. |
| Software Dependencies | No | All agents are trained with PPO [35]. The paper refers to algorithms and environments by name (PPO, OpenAI Gym, MuJoCo) and cites papers for them, but does not specify version numbers for any software dependencies. |
| Experiment Setup | Yes | Further details about network architecture and hyperparameters are given in Appendix F. All agents are trained with PPO [35]. |