Learning with Bandit Feedback in Potential Games
Authors: Amélie Heliou, Johanne Cohen, Panayotis Mertikopoulos
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This paper examines the equilibrium convergence properties of no-regret learning with exponential weights in potential games. To establish convergence with minimal information requirements on the players side, we focus on two frameworks: the semi-bandit case (where players have access to a noisy estimate of their payoff vectors, including strategies they did not play), and the bandit case (where players are only able to observe their in-game, realized payoffs). In the semi-bandit case, we show that the induced sequence of play converges almost surely to a Nash equilibrium at a quasi-exponential rate. In the bandit case, the same result holds for ε-approximations of Nash equilibria if we introduce an exploration factor ε > 0 that guarantees that action choice probabilities never fall below ε. In particular, if the algorithm is run with a suitably decreasing exploration factor, the sequence of play converges to a bona fide Nash equilibrium with probability 1. |
| Researcher Affiliation | Academia | Johanne Cohen LRI-CNRS, Université Paris-Sud,Université Paris-Saclay, France johanne.cohen@lri.fr Amélie Héliou LIX, Ecole Polytechnique, CNRS, AMIBio, Inria, Université Paris-Saclay amelie.heliou@polytechnique.edu Panayotis Mertikopoulos Univ. Grenoble Alpes, CNRS, Inria, LIG, F-38000, Grenoble, France panayotis.mertikopoulos@imag.fr |
| Pseudocode | Yes | Algorithm 1 ε-HEDGE with generic feedback Algorithm 2 Exponential weights with annealing |
| Open Source Code | No | The paper does not provide any explicit statements or links indicating that source code for the methodology is openly available. |
| Open Datasets | No | The paper is theoretical and does not mention using any datasets for training. Therefore, no information about public dataset availability is provided. |
| Dataset Splits | No | The paper is theoretical and does not describe experiments with data. Thus, there is no mention of training/validation/test dataset splits. |
| Hardware Specification | No | The paper is theoretical and focuses on mathematical proofs and algorithms. It does not describe any experimental setup or mention hardware specifications. |
| Software Dependencies | No | The paper is theoretical and does not describe any specific software dependencies or versions used for implementation or experimentation. |
| Experiment Setup | No | The paper is theoretical and focuses on algorithm design and convergence analysis. It does not describe an experimental setup with specific hyperparameters or training configurations. |