Online Learning in Unknown Markov Games
Authors: Yi Tian, Yuanhao Wang, Tiancheng Yu, Suvrit Sra
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | We present an algorithm that achieves a sublinear O(K 2/3) regret after K episodes. This is the first sublinear regret bound (to our knowledge) for online learning in unknown Markov games. Importantly, our regret bound is independent of the size of the opponents action spaces. As a result, even when the opponents actions are fully observable, our regret bound improves upon existing analysis (e.g., (Xie et al., 2020)) by an exponential factor in the number of opponents. Theorem 2 (Regret bounds). For any p (0, 1), let ι = log(HSAK/p). If we run V-OL with our hyperparameter specification (3) for some large constant c > 0 and G 1 in an online two-player zero-sum MG, then with probability at least 1 p, the regret in K episodes satisfies Regret(K) = O GH3Sι2 + GH5SAKι + G 1KH. |
| Researcher Affiliation | Academia | 1Department of EECS, MIT 2Department of Computer Science, Princeton University. |
| Pseudocode | Yes | Algorithm 1 Optimistic Nash V-learning for Online Learning (V-OL) |
| Open Source Code | No | The paper does not contain any statement about making its source code available or provide a link to a code repository. |
| Open Datasets | No | The paper is theoretical and does not describe the use of any specific datasets for training, nor does it provide information about their public availability. |
| Dataset Splits | No | As this is a theoretical paper, it does not describe experimental validation or dataset splits for training, validation, or testing. |
| Hardware Specification | No | The paper is theoretical and does not describe any specific hardware used for experiments. |
| Software Dependencies | No | The paper is theoretical and does not specify any software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and describes an algorithm's parameters as part of its theoretical formulation (e.g., hyperparameter specification (3)), but does not provide specific experimental setup details like concrete hyperparameter values or training configurations for an empirical study. |