Learning Markov Games with Adversarial Opponents: Efficient Algorithms and Fundamental Limits

Authors: Qinghua Liu, Yuanhao Wang, Chi Jin

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Along this direction, we present a new complete set of positive and negative results: When the policies of the opponents are revealed at the end of each episode, we propose new efficient algorithms achieving K-regret bounds... This is complemented with an exponential lower bound... When the policies of the opponents are not revealed, we prove a statistical hardness result... To summarize, we provide a complete set of results including both efficient algorithms and fundamental limits for no-regret learning in Markov games with adversarial opponents.
Researcher Affiliation Academia Qinghua Liu * 1 Yuanhao Wang * 1 Chi Jin 1 1Princeton University, New Jersey, USA. Correspondence to: Qinghua Liu <qinghual@princeton.edu>.
Pseudocode Yes Algorithm 1 Optimistic Policy EXP3, Subroutine 1 Optimistic Policy Evaluation, Algorithm 2 Adaptive Optimistic Policy EXP3, Subroutine 2 Optimistic Best Response
Open Source Code No The paper does not provide any specific link or explicit statement about releasing the source code for the described methodology.
Open Datasets No The paper is theoretical and does not describe experiments using publicly available datasets.
Dataset Splits No The paper is theoretical and does not involve empirical experiments with dataset splits.
Hardware Specification No The paper is theoretical and does not describe hardware specifications used for experiments.
Software Dependencies No The paper is theoretical and focuses on algorithms and mathematical proofs, therefore it does not list specific software dependencies with version numbers.
Experiment Setup No The paper is theoretical and focuses on algorithmic design and mathematical analysis, rather than detailing an empirical experimental setup with hyperparameters or training configurations.