Strategizing against No-regret Learners
Authors: Yuan Deng, Jon Schneider, Balasubramanian Sivan
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We study this question and show that under some mild assumptions, the player can always guarantee himself a utility of at least what he would get in a Stackelberg equilibrium of the game. When the no-regret learner has only two actions, we show that the player cannot get any higher utility than the Stackelberg equilibrium utility. But when the no-regret learner has more than two actions and plays a mean-based no-regret strategy, we show that the player can get strictly higher than the Stackelberg equilibrium utility. We provide a characterization of the optimal game-play for the player against a mean-based no-regret learner as a solution to a control problem. When the no-regret learner s strategy also guarantees him a no-swap regret, we show that the player cannot get anything higher than a Stackelberg equilibrium utility. |
| Researcher Affiliation | Collaboration | Yuan Deng Duke University ericdy@cs.duke.edu Jon Schneider Google Research jschnei@google.com Balasubramanian Sivan Google Research balusivan@google.com |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access to source code for the methodology described. |
| Open Datasets | No | The paper is theoretical and does not use datasets for training. |
| Dataset Splits | No | The paper is theoretical and does not describe any validation splits or processes. |
| Hardware Specification | No | The paper is theoretical and does not describe any hardware specifications used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not describe any specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe any experimental setup details such as hyperparameters or training configurations. |