Regret Minimization and Convergence to Equilibria in General-sum Markov Games
Authors: Liad Erez, Tal Lancewicki, Uri Sherman, Tomer Koren, Yishay Mansour
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this work, we present the first (to our knowledge) algorithm for learning in general-sum Markov games that provides sublinear regret guarantees when executed by all agents. The bounds we obtain are for swap regret, and thus, along the way, imply convergence to a correlated equilibrium. Our algorithm is decentralized, computationally efficient, and does not require any communication between agents. |
| Researcher Affiliation | Collaboration | 1Blavatnik School of Computer Science, Tel Aviv University, Israel 2Google Research, Tel Aviv. |
| Pseudocode | Yes | Algorithm 1 Policy Optimization by Swap Regret Minimization |
| Open Source Code | No | The paper does not provide any explicit statements or links indicating that the source code for the described methodology is publicly available, in supplementary materials, or in a repository. |
| Open Datasets | No | The paper is theoretical and does not describe or use any specific dataset for training. Therefore, it does not provide concrete access information for a publicly available or open dataset. |
| Dataset Splits | No | The paper is theoretical and does not conduct experiments with datasets. As such, it does not provide specific dataset split information for validation. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used to run experiments or computations. |
| Software Dependencies | No | The paper is theoretical and does not specify any software dependencies with version numbers required to replicate experiments or computations. |
| Experiment Setup | No | The paper is theoretical and describes algorithm parameters for its analytical bounds (e.g., 'parameter 𝛾> 0, learning rate 𝜂> 0, regularizer 𝑅( )' and specific choices like '𝜂= 1 96𝐻2𝑚𝑆𝐴' for theorems), but it does not provide an experimental setup with hyperparameters or system-level training settings for actual experiments. |