reproducibilityindex.ai

Learning Markov Games with Adversarial Opponents: Efficient Algorithms and Fundamental Limits

Authors: Qinghua Liu, Yuanhao Wang, Chi Jin

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	Along this direction, we present a new complete set of positive and negative results: When the policies of the opponents are revealed at the end of each episode, we propose new efﬁcient algorithms achieving K-regret bounds... This is complemented with an exponential lower bound... When the policies of the opponents are not revealed, we prove a statistical hardness result... To summarize, we provide a complete set of results including both efﬁcient algorithms and fundamental limits for no-regret learning in Markov games with adversarial opponents.
Researcher Affiliation	Academia	Qinghua Liu * 1 Yuanhao Wang * 1 Chi Jin 1 1Princeton University, New Jersey, USA. Correspondence to: Qinghua Liu <qinghual@princeton.edu>.
Pseudocode	Yes	Algorithm 1 Optimistic Policy EXP3, Subroutine 1 Optimistic Policy Evaluation, Algorithm 2 Adaptive Optimistic Policy EXP3, Subroutine 2 Optimistic Best Response
Open Source Code	No	The paper does not provide any specific link or explicit statement about releasing the source code for the described methodology.
Open Datasets	No	The paper is theoretical and does not describe experiments using publicly available datasets.
Dataset Splits	No	The paper is theoretical and does not involve empirical experiments with dataset splits.
Hardware Specification	No	The paper is theoretical and does not describe hardware specifications used for experiments.
Software Dependencies	No	The paper is theoretical and focuses on algorithms and mathematical proofs, therefore it does not list specific software dependencies with version numbers.
Experiment Setup	No	The paper is theoretical and focuses on algorithmic design and mathematical analysis, rather than detailing an empirical experimental setup with hyperparameters or training configurations.