Bandit Learning in Concave N-Person Games
Authors: Mario Bravo, David Leslie, Panayotis Mertikopoulos
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This paper examines the long-run behavior of learning with bandit feedback in non-cooperative concave games... our analysis shows that no-regret learning based on mirror descent with bandit feedback converges to Nash equilibrium with probability 1. We also derive an upper bound for the convergence rate of the process that nearly matches the best attainable rate for single-agent bandit stochastic optimization. Theorem 5.1. Suppose that the players of a monotone game G G(N, X, u) follow (MD-b) with step-size γn and query radius δn such that... Then, the sequence of realized actions ˆXn converges to Nash equilibrium with probability 1. Theorem 5.2. Let x be the (necessarily unique) Nash equilibrium of a β-strongly monotone game... we have E[ ˆXn x 2] = O(n 1/3). |
| Researcher Affiliation | Collaboration | Mario Bravo Universidad de Santiago de Chile Departamento de Matemática y Ciencia de la Computación mario.bravo.g@usach.cl, David Leslie Lancaster University & PROWLER.io d.leslie@lancaster.ac.uk, Panayotis Mertikopoulos Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP LIG 38000 Grenoble, France. panayotis.mertikopoulos@imag.fr |
| Pseudocode | Yes | Algorithm 1: Multi-agent mirror descent with bandit feedback (player indices suppressed) |
| Open Source Code | No | The paper is theoretical and does not mention releasing source code or provide links to a repository. |
| Open Datasets | No | The paper is theoretical and does not use any datasets for training or evaluation. |
| Dataset Splits | No | The paper is theoretical and does not describe dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not mention any specific hardware used for running experiments. |
| Software Dependencies | No | The paper is theoretical and does not specify software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup with hyperparameters or training configurations for empirical evaluation. It describes parameters for a theoretical algorithm, but not an experimental setup. |