reproducibilityindex.ai

Online Learning in Unknown Markov Games

Authors: Yi Tian, Yuanhao Wang, Tiancheng Yu, Suvrit Sra

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	We present an algorithm that achieves a sublinear O(K 2/3) regret after K episodes. This is the ﬁrst sublinear regret bound (to our knowledge) for online learning in unknown Markov games. Importantly, our regret bound is independent of the size of the opponents action spaces. As a result, even when the opponents actions are fully observable, our regret bound improves upon existing analysis (e.g., (Xie et al., 2020)) by an exponential factor in the number of opponents. Theorem 2 (Regret bounds). For any p (0, 1), let ι = log(HSAK/p). If we run V-OL with our hyperparameter speciﬁcation (3) for some large constant c > 0 and G 1 in an online two-player zero-sum MG, then with probability at least 1 p, the regret in K episodes satisﬁes Regret(K) = O GH3Sι2 + GH5SAKι + G 1KH.
Researcher Affiliation	Academia	1Department of EECS, MIT 2Department of Computer Science, Princeton University.
Pseudocode	Yes	Algorithm 1 Optimistic Nash V-learning for Online Learning (V-OL)
Open Source Code	No	The paper does not contain any statement about making its source code available or provide a link to a code repository.
Open Datasets	No	The paper is theoretical and does not describe the use of any specific datasets for training, nor does it provide information about their public availability.
Dataset Splits	No	As this is a theoretical paper, it does not describe experimental validation or dataset splits for training, validation, or testing.
Hardware Specification	No	The paper is theoretical and does not describe any specific hardware used for experiments.
Software Dependencies	No	The paper is theoretical and does not specify any software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and describes an algorithm's parameters as part of its theoretical formulation (e.g., hyperparameter specification (3)), but does not provide specific experimental setup details like concrete hyperparameter values or training configurations for an empirical study.