Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Policy Regret in Repeated Games
Authors: Raman Arora, Michael Dinitz, Teodor Vanislavov Marinov, Mehryar Mohri
NeurIPS 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | The notion of policy regret in online learning is a well defined performance measure for the common scenario of adaptive adversaries, which more traditional quantities such as external regret do not take into account. We revisit the notion of policy regret and first show that there are online learning settings in which policy regret and external regret are incompatible: any sequence of play that achieves a favorable regret with respect to one definition must do poorly with respect to the other. We then focus on the game-theoretic setting where the adversary is a self-interested agent. In that setting, we show that external regret and policy regret are not in conflict and, in fact, that a wide class of algorithms can ensure a favorable regret with respect to both definitions, so long as the adversary is also using such an algorithm. We also show that the sequence of play of no-policy regret algorithms converges to a policy equilibrium, a new notion of equilibrium that we introduce. Relating this back to external regret, we show that coarse correlated equilibria, which no-external regret players converge to, are a strict subset of policy equilibria. Thus, in game-theoretic settings, every sequence of play with no external regret also admits no policy regret, but the converse does not hold. |
| Researcher Affiliation | Collaboration | Raman Arora Dept. of Computer Science Johns Hopkins University Baltimore, MD 21204 EMAIL Michael Dinitz Dept. of Computer Science Johns Hopkins University Baltimore, MD 21204 EMAIL Teodor V. Marinov Dept. of Computer Science Johns Hopkins University Baltimore, MD 21204 EMAIL Mehryar Mohri Courant Institute and Google Research New York, NY 10012 EMAIL |
| Pseudocode | No | The paper is theoretical and focuses on proofs and definitions; it does not provide any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not mention providing open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not involve empirical studies with datasets. Therefore, it does not discuss public datasets or their availability. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical studies with datasets. Therefore, it does not discuss dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe any experiments that would require specific hardware. No hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not describe any experiments that would require specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and does not describe any empirical experimental setups, hyperparameters, or training settings. |