Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Scalable Evaluation of Multi-Agent Reinforcement Learning with Melting Pot
Authors: Joel Z Leibo, Edgar A Dueñez-Guzman, Alexander Vezhnevets, John P Agapiou, Peter Sunehag, Raphael Koster, Jayd Matyas, Charlie Beattie, Igor Mordatch, Thore Graepel
ICML 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply these test scenarios to standard MARL training algorithms, and demonstrate how Melting Pot reveals weaknesses not apparent from training performance alone. |
| Researcher Affiliation | Industry | 1Deep Mind 2Google Brain. |
| Pseudocode | No | The paper describes methods in prose and with diagrams (e.g., Fig. 3 for process steps), but it does not include any formally labeled pseudocode blocks or algorithms. |
| Open Source Code | No | The paper states 'Since Melting Pot will be openly released, it can be extended by any interested researchers.', which indicates future availability, not current concrete access to the source code. |
| Open Datasets | No | The paper states 'Since Melting Pot will be openly released, it can be extended by any interested researchers.', indicating future availability of the environments/scenarios used as the dataset, but not current concrete access. |
| Dataset Splits | No | The paper describes training and testing phases but does not explicitly mention or detail a separate validation dataset or how data was split for validation purposes. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or processor types) used for running the experiments. |
| Software Dependencies | No | The paper mentions software components and architectures like A3C, V-MPO, and OPRE, but it does not provide specific version numbers for any libraries, frameworks, or programming languages used (e.g., PyTorch 1.9, Python 3.8). |
| Experiment Setup | Yes | Each agent was trained for 10^9 steps. At test time, we set the focal population to be the uniform distribution over the N agents. All agent architectures had the same size convolutional net and LSTM. |