Scalable Evaluation of Multi-Agent Reinforcement Learning with Melting Pot
Authors: Joel Z Leibo, Edgar A DueƱez-Guzman, Alexander Vezhnevets, John P Agapiou, Peter Sunehag, Raphael Koster, Jayd Matyas, Charlie Beattie, Igor Mordatch, Thore Graepel
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply these test scenarios to standard MARL training algorithms, and demonstrate how Melting Pot reveals weaknesses not apparent from training performance alone. |
| Researcher Affiliation | Industry | 1Deep Mind 2Google Brain. |
| Pseudocode | No | The paper describes methods in prose and with diagrams (e.g., Fig. 3 for process steps), but it does not include any formally labeled pseudocode blocks or algorithms. |
| Open Source Code | No | The paper states 'Since Melting Pot will be openly released, it can be extended by any interested researchers.', which indicates future availability, not current concrete access to the source code. |
| Open Datasets | No | The paper states 'Since Melting Pot will be openly released, it can be extended by any interested researchers.', indicating future availability of the environments/scenarios used as the dataset, but not current concrete access. |
| Dataset Splits | No | The paper describes training and testing phases but does not explicitly mention or detail a separate validation dataset or how data was split for validation purposes. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or processor types) used for running the experiments. |
| Software Dependencies | No | The paper mentions software components and architectures like A3C, V-MPO, and OPRE, but it does not provide specific version numbers for any libraries, frameworks, or programming languages used (e.g., PyTorch 1.9, Python 3.8). |
| Experiment Setup | Yes | Each agent was trained for 10^9 steps. At test time, we set the focal population to be the uniform distribution over the N agents. All agent architectures had the same size convolutional net and LSTM. |