Learning with Bandit Feedback in Potential Games

Authors: Amélie Heliou, Johanne Cohen, Panayotis Mertikopoulos

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This paper examines the equilibrium convergence properties of no-regret learning with exponential weights in potential games. To establish convergence with minimal information requirements on the players side, we focus on two frameworks: the semi-bandit case (where players have access to a noisy estimate of their payoff vectors, including strategies they did not play), and the bandit case (where players are only able to observe their in-game, realized payoffs). In the semi-bandit case, we show that the induced sequence of play converges almost surely to a Nash equilibrium at a quasi-exponential rate. In the bandit case, the same result holds for ε-approximations of Nash equilibria if we introduce an exploration factor ε > 0 that guarantees that action choice probabilities never fall below ε. In particular, if the algorithm is run with a suitably decreasing exploration factor, the sequence of play converges to a bona fide Nash equilibrium with probability 1.
Researcher Affiliation Academia Johanne Cohen LRI-CNRS, Université Paris-Sud,Université Paris-Saclay, France johanne.cohen@lri.fr Amélie Héliou LIX, Ecole Polytechnique, CNRS, AMIBio, Inria, Université Paris-Saclay amelie.heliou@polytechnique.edu Panayotis Mertikopoulos Univ. Grenoble Alpes, CNRS, Inria, LIG, F-38000, Grenoble, France panayotis.mertikopoulos@imag.fr
Pseudocode Yes Algorithm 1 ε-HEDGE with generic feedback Algorithm 2 Exponential weights with annealing
Open Source Code No The paper does not provide any explicit statements or links indicating that source code for the methodology is openly available.
Open Datasets No The paper is theoretical and does not mention using any datasets for training. Therefore, no information about public dataset availability is provided.
Dataset Splits No The paper is theoretical and does not describe experiments with data. Thus, there is no mention of training/validation/test dataset splits.
Hardware Specification No The paper is theoretical and focuses on mathematical proofs and algorithms. It does not describe any experimental setup or mention hardware specifications.
Software Dependencies No The paper is theoretical and does not describe any specific software dependencies or versions used for implementation or experimentation.
Experiment Setup No The paper is theoretical and focuses on algorithm design and convergence analysis. It does not describe an experimental setup with specific hyperparameters or training configurations.