Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Multiagent Evaluation under Incomplete Information

Authors: Mark Rowland, Shayegan Omidshafiei, Karl Tuyls, Julien Perolat, Michal Valko, Georgios Piliouras, Remi Munos

NeurIPS 2019 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This paper investigates multiagent evaluation in the incomplete information regime, involving general-sum many-player games with noisy outcomes. We propose adaptive algorithms for accurate ranking, provide correctness and sample complexity guarantees, then introduce a means of connecting uncertainties in noisy match outcomes to uncertainties in rankings. We evaluate the performance of these approaches in several domains, including Bernoulli games, a soccer meta-game, and Kuhn poker.
Researcher Affiliation Collaboration Mark Rowland1, EMAIL Shayegan Omidshafiei2, EMAIL Karl Tuyls2 EMAIL Julien Pérolat1 EMAIL Michal Valko2 EMAIL Georgios Piliouras3 EMAIL Rémi Munos2 EMAIL 1Deep Mind London 2Deep Mind Paris 3 Singapore University of Technology and Design
Pseudocode Yes Algorithm 1 Response Graph UCB(δ, S, C(δ))
Open Source Code No No explicit statement about providing open-source code or a link to a code repository for the described methodology was found.
Open Datasets Yes Second, we analyze a Soccer meta-game with the payoffs in Liu et al. [33, Figure 2]... Finally, we consider a Kuhn poker meta-game with asymmetric payoffs and 3 players with access to 3 agents each, similar to the domain analyzed in [36]
Dataset Splits No No explicit train/validation/test dataset splits (percentages, absolute counts, or references to predefined splits with specific details) are provided. The paper discusses simulating noisy outcomes and sampling interactions.
Hardware Specification No No specific hardware details (such as CPU/GPU models, memory, or detailed cloud/cluster configurations) used for running the experiments are mentioned in the paper.
Software Dependencies No The paper mentions 'Mu Jo Co simulation environment [46]' but does not provide a specific version number. No other specific software components with version numbers are listed.
Experiment Setup Yes In all domains, noisy outcomes are simulated by drawing the winning player i.i.d. from a Bernoulli(Mk(s)) distribution over payoff tables M. We build intuition by evaluating Response Graph UCB(δ : 0.1, S : UE, C : UCB), i.e., with a 90% confidence level, on a two-player game with payoffs shown in Fig. 4.1a. Due to the much larger strategy spaces of these games, we cap the number of samples available at 1e5.