reproducibilityindex.ai

Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Authors: Ryan Lowe, YI WU, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, Igor Mordatch

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show the strength of our approach compared to existing methods in cooperative as well as competitive scenarios, where agent populations are able to discover various physical and informational coordination strategies.
Researcher Affiliation	Collaboration	Ryan Lowe Mc Gill University Open AI Yi Wu UC Berkeley Aviv Tamar UC Berkeley Jean Harb Mc Gill University Open AI Pieter Abbeel UC Berkeley Open AI Igor Mordatch Open AI
Pseudocode	Yes	We provide the description of the full algorithm in the Appendix.
Open Source Code	No	The paper provides a link to the multi-agent particle environments used for experiments (https://github.com/openai/multiagent-particle-envs), but does not state that the source code for their proposed MADDPG methodology itself is made publicly available.
Open Datasets	Yes	To perform our experiments, we adopt the grounded communication environment proposed in [24]... The environments are publicly available: https://github.com/openai/multiagent-particle-envs
Dataset Splits	No	The paper describes training and evaluation (e.g., "We train our models until convergence, and then evaluate them by averaging various metrics for 1000 further iterations"), but it operates in a simulation environment where data is generated dynamically, and thus does not specify explicit training/validation/test splits of a fixed dataset.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper mentions using the "Gumbel-Softmax estimator [14]" but does not specify any software names with version numbers for reproducibility (e.g., programming languages, libraries, frameworks with their versions).
Experiment Setup	Yes	Unless otherwise speciﬁed, our policies are parameterized by a two-layer ReLU MLP with 64 units per layer... setting λ = 0.001 in Eq. 7... We choose K = 3 sub-policies for the keep-away and cooperative navigation environments, and K = 2 for predator-prey.