Learning Markov Games with Adversarial Opponents: Efficient Algorithms and Fundamental Limits
Authors: Qinghua Liu, Yuanhao Wang, Chi Jin
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Along this direction, we present a new complete set of positive and negative results: When the policies of the opponents are revealed at the end of each episode, we propose new efficient algorithms achieving K-regret bounds... This is complemented with an exponential lower bound... When the policies of the opponents are not revealed, we prove a statistical hardness result... To summarize, we provide a complete set of results including both efficient algorithms and fundamental limits for no-regret learning in Markov games with adversarial opponents. |
| Researcher Affiliation | Academia | Qinghua Liu * 1 Yuanhao Wang * 1 Chi Jin 1 1Princeton University, New Jersey, USA. Correspondence to: Qinghua Liu <qinghual@princeton.edu>. |
| Pseudocode | Yes | Algorithm 1 Optimistic Policy EXP3, Subroutine 1 Optimistic Policy Evaluation, Algorithm 2 Adaptive Optimistic Policy EXP3, Subroutine 2 Optimistic Best Response |
| Open Source Code | No | The paper does not provide any specific link or explicit statement about releasing the source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not describe experiments using publicly available datasets. |
| Dataset Splits | No | The paper is theoretical and does not involve empirical experiments with dataset splits. |
| Hardware Specification | No | The paper is theoretical and does not describe hardware specifications used for experiments. |
| Software Dependencies | No | The paper is theoretical and focuses on algorithms and mathematical proofs, therefore it does not list specific software dependencies with version numbers. |
| Experiment Setup | No | The paper is theoretical and focuses on algorithmic design and mathematical analysis, rather than detailing an empirical experimental setup with hyperparameters or training configurations. |