Exploration-Exploitation in Multi-Agent Competition: Convergence with Bounded Rationality
Authors: Stefanos Leonardos, Georgios Piliouras, Kelly Spendlove
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | As showcased by our experiments in network zero-sum games, these theoretical results provide the necessary guarantees for an algorithmic approach to the currently open problem of equilibrium selection in competitive multi-agent settings. |
| Researcher Affiliation | Academia | Stefanos Leonardos, Georgios Piliouras Singapore University of Technology and Design {stefanos_leonardos;georgios}@sutd.edu.sg Kelly Spendlove University of Oxford spendlove@maths.ox.ac.uk |
| Pseudocode | No | The paper describes the Q-learning dynamics and update rules using mathematical equations, but it does not include any structured pseudocode or an algorithm block labeled as such. |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] |
| Open Datasets | No | The paper uses defined game environments (Asymmetric Matching Pennies, Match-Mismatch Game) for its experiments, which are described within the paper itself rather than being external, publicly available datasets with specific access information (URL, DOI, citation). |
| Dataset Splits | No | The paper describes simulations within defined game environments rather than experiments on traditional datasets with explicit training, validation, and test splits (e.g., 80/10/10 percentages or sample counts). |
| Hardware Specification | No | Our simulations concern theoretical abstractions of network games. The total amount of computation is not a concern and all our experiments can be reproduced in any conventional machine. |
| Software Dependencies | No | The paper does not specify any particular software dependencies with version numbers (e.g., specific libraries, frameworks, or solvers with their respective versions) required to reproduce the experiments. |
| Experiment Setup | Yes | We plot the exploration path along two representative exploration-exploitation policies: Explore-Then-Exploit (ETE) [5], which starts with (relatively) high exploration that gradually reduces to zero and Cyclical Learning Rate with 1 cycle (CLR-1) [50], which starts with low exploration, increases to high exploration around the half-life of the cycle and then decays to 0. Summary statistics from 100 runs with 3 profiles of exploration rates in a 7 non-dummy agent instance of (MMG). |