Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design

Authors: Michael Dennis, Natasha Jaques, Eugene Vinitsky, Alexandre Bayen, Stuart Russell, Andrew Critch, Sergey Levine

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate that PAIRED produces a natural curriculum of increasingly complex environments, and PAIRED agents achieve higher zero-shot transfer performance when tested in highly novel environments. 5 Experiments
Researcher Affiliation Collaboration University of California Berkeley AI Research (BAIR), Berkeley, CA, 94704 2Google Research, Brain team, Mountain View, CA, 94043
Pseudocode Yes Algorithm 1: PAIRED.
Open Source Code Yes The code for PAIRED and our experiments is available in open source at https://github.com/google-research/ google-research/tree/master/social_rl/.
Open Datasets Yes Here we investigate navigation tasks (based on [9]), in which an agent must explore to find a goal (green square in Figure 1) while navigating around obstacles. [9] Maxime Chevalier-Boisvert, Lucas Willems, and Suman Pal. Minimalistic gridworld environment for openai gym. https://github.com/maximecb/gym-minigrid, 2018. To compare more closely with prior work on minimax adversarial RL [28, 43], we construct an additional experiment in a modified version of the Mu Jo Co hopper domain [42]. [42] Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026 5033. IEEE, 2012.
Dataset Splits No Parameters for the emergent complexity task are selected to maximize the solved path length, and parameters for the transfer task are selected using a set of validation environments. While validation environments are mentioned, no specific dataset split information (percentages, counts, or explicit methodology for fixed datasets) is provided. The environments are *generated* rather than split from a pre-existing dataset.
Hardware Specification No The paper mentions 'funding computation expenses associated with this work' but does not specify any hardware details such as GPU/CPU models or specific computing resources used for experiments.
Software Dependencies No All agents are trained with PPO [35]. The paper refers to algorithms and environments by name (PPO, OpenAI Gym, MuJoCo) and cites papers for them, but does not specify version numbers for any software dependencies.
Experiment Setup Yes Further details about network architecture and hyperparameters are given in Appendix F. All agents are trained with PPO [35].