Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
MAESTRO: Open-Ended Environment Design for Multi-Agent Reinforcement Learning
Authors: Mikayel Samvelyan, Akbir Khan, Michael D Dennis, Minqi Jiang, Jack Parker-Holder, Jakob Nicolaus Foerster, Roberta Raileanu, Tim Rocktäschel
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that MAESTRO outperforms a number of strong baselines on competitive two-player games, spanning discrete and continuous control settings. |
| Researcher Affiliation | Collaboration | 1Meta AI 2University College London 3UC Berkeley 4University of Oxford EMAIL |
| Pseudocode | Yes | Algorithm 1 provides its pseudocode. |
| Open Source Code | No | The paper does not provide a specific link to source code for the described methodology or an explicit statement of code release. |
| Open Datasets | Yes | Laser Tag is a grid-based, two-player zero-sum game proposed by Lanctot et al. (2017), Multi Car Racing (MCR, Schwarting et al., 2021), MCR test environments are the Formula 1 Car Racing tracks from (Jiang et al., 2021a). |
| Dataset Splits | Yes | We selected the best performing settings based on the average return on the unseen validation levels against previously unseen opponents on at least 5 seeds. |
| Hardware Specification | Yes | All experiments are performed on an internal cluster. Each job (representing a seed) is performed with a single Tesla V100 GPU and 10 CPUs. |
| Software Dependencies | No | The paper mentions software like PPO, Griddly, and PyTorch but does not provide specific version numbers for these dependencies as used in their experiments. |
| Experiment Setup | Yes | Table 2 summarises our final hyperparameter choices for all methods. |