reproducibilityindex.ai

Exploration-Exploitation in Multi-Agent Competition: Convergence with Bounded Rationality

Authors: Stefanos Leonardos, Georgios Piliouras, Kelly Spendlove

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	As showcased by our experiments in network zero-sum games, these theoretical results provide the necessary guarantees for an algorithmic approach to the currently open problem of equilibrium selection in competitive multi-agent settings.
Researcher Affiliation	Academia	Stefanos Leonardos, Georgios Piliouras Singapore University of Technology and Design {stefanos_leonardos;georgios}@sutd.edu.sg Kelly Spendlove University of Oxford spendlove@maths.ox.ac.uk
Pseudocode	No	The paper describes the Q-learning dynamics and update rules using mathematical equations, but it does not include any structured pseudocode or an algorithm block labeled as such.
Open Source Code	Yes	Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes]
Open Datasets	No	The paper uses defined game environments (Asymmetric Matching Pennies, Match-Mismatch Game) for its experiments, which are described within the paper itself rather than being external, publicly available datasets with specific access information (URL, DOI, citation).
Dataset Splits	No	The paper describes simulations within defined game environments rather than experiments on traditional datasets with explicit training, validation, and test splits (e.g., 80/10/10 percentages or sample counts).
Hardware Specification	No	Our simulations concern theoretical abstractions of network games. The total amount of computation is not a concern and all our experiments can be reproduced in any conventional machine.
Software Dependencies	No	The paper does not specify any particular software dependencies with version numbers (e.g., specific libraries, frameworks, or solvers with their respective versions) required to reproduce the experiments.
Experiment Setup	Yes	We plot the exploration path along two representative exploration-exploitation policies: Explore-Then-Exploit (ETE) [5], which starts with (relatively) high exploration that gradually reduces to zero and Cyclical Learning Rate with 1 cycle (CLR-1) [50], which starts with low exploration, increases to high exploration around the half-life of the cycle and then decays to 0. Summary statistics from 100 runs with 3 profiles of exploration rates in a 7 non-dummy agent instance of (MMG).