Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments

Authors: Ryan Lowe, YI WU, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, Igor Mordatch

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show the strength of our approach compared to existing methods in cooperative as well as competitive scenarios, where agent populations are able to discover various physical and informational coordination strategies.
Researcher Affiliation Collaboration Ryan Lowe Mc Gill University Open AI Yi Wu UC Berkeley Aviv Tamar UC Berkeley Jean Harb Mc Gill University Open AI Pieter Abbeel UC Berkeley Open AI Igor Mordatch Open AI
Pseudocode Yes We provide the description of the full algorithm in the Appendix.
Open Source Code No The paper provides a link to the multi-agent particle environments used for experiments (https://github.com/openai/multiagent-particle-envs), but does not state that the source code for their proposed MADDPG methodology itself is made publicly available.
Open Datasets Yes To perform our experiments, we adopt the grounded communication environment proposed in [24]... The environments are publicly available: https://github.com/openai/multiagent-particle-envs
Dataset Splits No The paper describes training and evaluation (e.g., "We train our models until convergence, and then evaluate them by averaging various metrics for 1000 further iterations"), but it operates in a simulation environment where data is generated dynamically, and thus does not specify explicit training/validation/test splits of a fixed dataset.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions using the "Gumbel-Softmax estimator [14]" but does not specify any software names with version numbers for reproducibility (e.g., programming languages, libraries, frameworks with their versions).
Experiment Setup Yes Unless otherwise specified, our policies are parameterized by a two-layer ReLU MLP with 64 units per layer... setting λ = 0.001 in Eq. 7... We choose K = 3 sub-policies for the keep-away and cooperative navigation environments, and K = 2 for predator-prey.