Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Bandit Learning in Concave N-Person Games
Authors: Mario Bravo, David Leslie, Panayotis Mertikopoulos
NeurIPS 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | This paper examines the long-run behavior of learning with bandit feedback in non-cooperative concave games... our analysis shows that no-regret learning based on mirror descent with bandit feedback converges to Nash equilibrium with probability 1. We also derive an upper bound for the convergence rate of the process that nearly matches the best attainable rate for single-agent bandit stochastic optimization. Theorem 5.1. Suppose that the players of a monotone game G G(N, X, u) follow (MD-b) with step-size γn and query radius δn such that... Then, the sequence of realized actions ˆXn converges to Nash equilibrium with probability 1. Theorem 5.2. Let x be the (necessarily unique) Nash equilibrium of a β-strongly monotone game... we have E[ ˆXn x 2] = O(n 1/3). |
| Researcher Affiliation | Collaboration | Mario Bravo Universidad de Santiago de Chile Departamento de Matemática y Ciencia de la Computación EMAIL, David Leslie Lancaster University & PROWLER.io EMAIL, Panayotis Mertikopoulos Univ. Grenoble Alpes, CNRS, Inria, Grenoble INP LIG 38000 Grenoble, France. EMAIL |
| Pseudocode | Yes | Algorithm 1: Multi-agent mirror descent with bandit feedback (player indices suppressed) |
| Open Source Code | No | The paper is theoretical and does not mention releasing source code or provide links to a repository. |
| Open Datasets | No | The paper is theoretical and does not use any datasets for training or evaluation. |
| Dataset Splits | No | The paper is theoretical and does not describe dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not mention any specific hardware used for running experiments. |
| Software Dependencies | No | The paper is theoretical and does not specify software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe an experimental setup with hyperparameters or training configurations for empirical evaluation. It describes parameters for a theoretical algorithm, but not an experimental setup. |