Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Multiagent Evaluation under Incomplete Information
Authors: Mark Rowland, Shayegan Omidshafiei, Karl Tuyls, Julien Perolat, Michal Valko, Georgios Piliouras, Remi Munos
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper investigates multiagent evaluation in the incomplete information regime, involving general-sum many-player games with noisy outcomes. We propose adaptive algorithms for accurate ranking, provide correctness and sample complexity guarantees, then introduce a means of connecting uncertainties in noisy match outcomes to uncertainties in rankings. We evaluate the performance of these approaches in several domains, including Bernoulli games, a soccer meta-game, and Kuhn poker. |
| Researcher Affiliation | Collaboration | Mark Rowland1, EMAIL Shayegan Omidshafiei2, EMAIL Karl Tuyls2 EMAIL Julien Pérolat1 EMAIL Michal Valko2 EMAIL Georgios Piliouras3 EMAIL Rémi Munos2 EMAIL 1Deep Mind London 2Deep Mind Paris 3 Singapore University of Technology and Design |
| Pseudocode | Yes | Algorithm 1 Response Graph UCB(δ, S, C(δ)) |
| Open Source Code | No | No explicit statement about providing open-source code or a link to a code repository for the described methodology was found. |
| Open Datasets | Yes | Second, we analyze a Soccer meta-game with the payoffs in Liu et al. [33, Figure 2]... Finally, we consider a Kuhn poker meta-game with asymmetric payoffs and 3 players with access to 3 agents each, similar to the domain analyzed in [36] |
| Dataset Splits | No | No explicit train/validation/test dataset splits (percentages, absolute counts, or references to predefined splits with specific details) are provided. The paper discusses simulating noisy outcomes and sampling interactions. |
| Hardware Specification | No | No specific hardware details (such as CPU/GPU models, memory, or detailed cloud/cluster configurations) used for running the experiments are mentioned in the paper. |
| Software Dependencies | No | The paper mentions 'Mu Jo Co simulation environment [46]' but does not provide a specific version number. No other specific software components with version numbers are listed. |
| Experiment Setup | Yes | In all domains, noisy outcomes are simulated by drawing the winning player i.i.d. from a Bernoulli(Mk(s)) distribution over payoff tables M. We build intuition by evaluating Response Graph UCB(δ : 0.1, S : UE, C : UCB), i.e., with a 90% confidence level, on a two-player game with payoffs shown in Fig. 4.1a. Due to the much larger strategy spaces of these games, we cap the number of samples available at 1e5. |