reproducibilityindex.ai

Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design

Authors: Michael Dennis, Natasha Jaques, Eugene Vinitsky, Alexandre Bayen, Stuart Russell, Andrew Critch, Sergey Levine

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments demonstrate that PAIRED produces a natural curriculum of increasingly complex environments, and PAIRED agents achieve higher zero-shot transfer performance when tested in highly novel environments. 5 Experiments
Researcher Affiliation	Collaboration	University of California Berkeley AI Research (BAIR), Berkeley, CA, 94704 2Google Research, Brain team, Mountain View, CA, 94043
Pseudocode	Yes	Algorithm 1: PAIRED.
Open Source Code	Yes	The code for PAIRED and our experiments is available in open source at https://github.com/google-research/ google-research/tree/master/social_rl/.
Open Datasets	Yes	Here we investigate navigation tasks (based on [9]), in which an agent must explore to ﬁnd a goal (green square in Figure 1) while navigating around obstacles. [9] Maxime Chevalier-Boisvert, Lucas Willems, and Suman Pal. Minimalistic gridworld environment for openai gym. https://github.com/maximecb/gym-minigrid, 2018. To compare more closely with prior work on minimax adversarial RL [28, 43], we construct an additional experiment in a modiﬁed version of the Mu Jo Co hopper domain [42]. [42] Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 5026 5033. IEEE, 2012.
Dataset Splits	No	Parameters for the emergent complexity task are selected to maximize the solved path length, and parameters for the transfer task are selected using a set of validation environments. While validation environments are mentioned, no specific dataset split information (percentages, counts, or explicit methodology for fixed datasets) is provided. The environments are generated rather than split from a pre-existing dataset.
Hardware Specification	No	The paper mentions 'funding computation expenses associated with this work' but does not specify any hardware details such as GPU/CPU models or specific computing resources used for experiments.
Software Dependencies	No	All agents are trained with PPO [35]. The paper refers to algorithms and environments by name (PPO, OpenAI Gym, MuJoCo) and cites papers for them, but does not specify version numbers for any software dependencies.
Experiment Setup	Yes	Further details about network architecture and hyperparameters are given in Appendix F. All agents are trained with PPO [35].