Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
Authors: Ryan Lowe, YI WU, Aviv Tamar, Jean Harb, OpenAI Pieter Abbeel, Igor Mordatch
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show the strength of our approach compared to existing methods in cooperative as well as competitive scenarios, where agent populations are able to discover various physical and informational coordination strategies. |
| Researcher Affiliation | Collaboration | Ryan Lowe Mc Gill University Open AI Yi Wu UC Berkeley Aviv Tamar UC Berkeley Jean Harb Mc Gill University Open AI Pieter Abbeel UC Berkeley Open AI Igor Mordatch Open AI |
| Pseudocode | Yes | We provide the description of the full algorithm in the Appendix. |
| Open Source Code | No | The paper provides a link to the multi-agent particle environments used for experiments (https://github.com/openai/multiagent-particle-envs), but does not state that the source code for their proposed MADDPG methodology itself is made publicly available. |
| Open Datasets | Yes | To perform our experiments, we adopt the grounded communication environment proposed in [24]... The environments are publicly available: https://github.com/openai/multiagent-particle-envs |
| Dataset Splits | No | The paper describes training and evaluation (e.g., "We train our models until convergence, and then evaluate them by averaging various metrics for 1000 further iterations"), but it operates in a simulation environment where data is generated dynamically, and thus does not specify explicit training/validation/test splits of a fixed dataset. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions using the "Gumbel-Softmax estimator [14]" but does not specify any software names with version numbers for reproducibility (e.g., programming languages, libraries, frameworks with their versions). |
| Experiment Setup | Yes | Unless otherwise specified, our policies are parameterized by a two-layer ReLU MLP with 64 units per layer... setting λ = 0.001 in Eq. 7... We choose K = 3 sub-policies for the keep-away and cooperative navigation environments, and K = 2 for predator-prey. |