Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Exploration-Exploitation in Multi-Agent Competition: Convergence with Bounded Rationality
Authors: Stefanos Leonardos, Georgios Piliouras, Kelly Spendlove
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | As showcased by our experiments in network zero-sum games, these theoretical results provide the necessary guarantees for an algorithmic approach to the currently open problem of equilibrium selection in competitive multi-agent settings. |
| Researcher Affiliation | Academia | Stefanos Leonardos, Georgios Piliouras Singapore University of Technology and Design {stefanos_leonardos;georgios}@sutd.edu.sg Kelly Spendlove University of Oxford EMAIL |
| Pseudocode | No | The paper describes the Q-learning dynamics and update rules using mathematical equations, but it does not include any structured pseudocode or an algorithm block labeled as such. |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] |
| Open Datasets | No | The paper uses defined game environments (Asymmetric Matching Pennies, Match-Mismatch Game) for its experiments, which are described within the paper itself rather than being external, publicly available datasets with specific access information (URL, DOI, citation). |
| Dataset Splits | No | The paper describes simulations within defined game environments rather than experiments on traditional datasets with explicit training, validation, and test splits (e.g., 80/10/10 percentages or sample counts). |
| Hardware Specification | No | Our simulations concern theoretical abstractions of network games. The total amount of computation is not a concern and all our experiments can be reproduced in any conventional machine. |
| Software Dependencies | No | The paper does not specify any particular software dependencies with version numbers (e.g., specific libraries, frameworks, or solvers with their respective versions) required to reproduce the experiments. |
| Experiment Setup | Yes | We plot the exploration path along two representative exploration-exploitation policies: Explore-Then-Exploit (ETE) [5], which starts with (relatively) high exploration that gradually reduces to zero and Cyclical Learning Rate with 1 cycle (CLR-1) [50], which starts with low exploration, increases to high exploration around the half-life of the cycle and then decays to 0. Summary statistics from 100 runs with 3 profiles of exploration rates in a 7 non-dummy agent instance of (MMG). |