Bandit Learning in Concave N-Person Games

Authors: Mario Bravo, David Leslie, Panayotis Mertikopoulos

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical This paper examines the long-run behavior of learning with bandit feedback in non-cooperative concave games... our analysis shows that no-regret learning based on mirror descent with bandit feedback converges to Nash equilibrium with probability 1. We also derive an upper bound for the convergence rate of the process that nearly matches the best attainable rate for single-agent bandit stochastic optimization. Theorem 5.1. Suppose that the players of a monotone game G G(N, X, u) follow (MD-b) with step-size γn and query radius δn such that... Then, the sequence of realized actions ˆXn converges to Nash equilibrium with probability 1. Theorem 5.2. Let x be the (necessarily unique) Nash equilibrium of a β-strongly monotone game... we have E[ ˆXn x 2] = O(n 1/3).
Researcher Affiliation Collaboration Mario Bravo Universidad de Santiago de Chile Departamento de Matemática y Ciencia de la Computación mario.bravo.g@usach.cl, David Leslie Lancaster University & PROWLER.io d.leslie@lancaster.ac.uk, Panayotis Mertikopoulos Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP LIG 38000 Grenoble, France. panayotis.mertikopoulos@imag.fr
Pseudocode Yes Algorithm 1: Multi-agent mirror descent with bandit feedback (player indices suppressed)
Open Source Code No The paper is theoretical and does not mention releasing source code or provide links to a repository.
Open Datasets No The paper is theoretical and does not use any datasets for training or evaluation.
Dataset Splits No The paper is theoretical and does not describe dataset splits for training, validation, or testing.
Hardware Specification No The paper is theoretical and does not mention any specific hardware used for running experiments.
Software Dependencies No The paper is theoretical and does not specify software dependencies with version numbers.
Experiment Setup No The paper is theoretical and does not describe an experimental setup with hyperparameters or training configurations for empirical evaluation. It describes parameters for a theoretical algorithm, but not an experimental setup.