reproducibilityindex.ai

Emergent Complexity via Multi-Agent Competition

Authors: Trapit Bansal, Jakub Pachocki, Szymon Sidor, Ilya Sutskever, Igor Mordatch

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This work introduces several competitive multi-agent environments where agents compete in a 3D world with simulated physics. The trained agents learn a wide variety of complex and interesting skills, even though the environment themselves are relatively simple.
Researcher Affiliation	Collaboration	Trapit Bansal UMass Amherst Jakub Pachocki Open AI Szymon Sidor Open AI Ilya Sutskever Open AI Igor Mordatch Open AI
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks.
Open Source Code	Yes	Code for the environments as well as learned policy parameters for agents on all the environments are available: https://github.com/openai/multiagent-competition.
Open Datasets	No	The paper describes a simulation environment where data is generated through agent interaction rather than using a pre-existing, publicly available dataset with concrete access information. While environments are described, they are not presented as a downloadable dataset.
Dataset Splits	No	The paper does not specify fixed training/validation/test dataset splits. It describes generating data via parallel rollouts during training but does not partition a static dataset into these conventional splits.
Hardware Specification	No	The paper mentions running experiments 'on 4 GPUs' but does not provide specific models, memory, or other detailed hardware specifications (e.g., CPU, specific GPU model numbers).
Software Dependencies	No	The paper mentions several software components like 'Mu Jo Co framework (Todorov et al., 2012)', 'Proximal Policy Optimization (PPO) (Schulman et al., 2017)', 'Adam (Kingma & Ba, 2014)', and 'Open AI Gym Humanoid-v1 environment', but it does not provide specific version numbers for any of them.
Experiment Setup	Yes	We use Adam (Kingma & Ba, 2014) with learning rate 0.001. The clipping parameter in PPO ϵ = 0.2, discounting factor γ = 0.995 and generalized advantage estimate parameter λ = 0.95. Each iteration, we collect 409600 samples from the parallel rollouts and perform multiple epochs of PPO training in mini-batches consisting of 5120 samples. For MLP policies we did 6 epochs of SGD per iteration and for LSTM policies we did 3 epochs.