Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games
Authors: Wei Xiong, Han Zhong, Chengshuai Shi, Cong Shen, Tong Zhang
ICML 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | Theoretical analysis demonstrates that the posterior sampling algorithm admits a T-regret bound for problems with a low multi-agent decoupling coefficient, which is a new complexity measure for MGs, where T denotes the number of episodes. When specialized to linear MGs, the obtained regret bound matches the state-of-the-art results. To the best of our knowledge, this is the first provably efficient posterior sampling algorithm for MGs with frequentist regret guarantees, which enriches the toolbox for MGs and promotes the broad applicability of posterior sampling. |
| Researcher Affiliation | Collaboration | 1The Hong Kong University of Science and Technology; 2Center for Data Science, Peking University; 3University of Virginia; 4Google Research. |
| Pseudocode | Yes | Algorithm 1 Conditional Posterior Sampling with Booster; Algorithm 2 Main(F, η, D, T, λ); Algorithm 3 Booster(F, η, D, µf, T, λ) |
| Open Source Code | No | The paper does not provide any statement or link regarding the availability of open-source code for the described methodology. |
| Open Datasets | No | The paper is theoretical and does not conduct experiments on specific datasets. Therefore, no information about publicly available training datasets is provided. |
| Dataset Splits | No | The paper is theoretical and does not conduct experiments. Therefore, no information regarding dataset splits for validation is provided. |
| Hardware Specification | No | The paper is theoretical and does not describe empirical experiments. Therefore, no hardware specifications are mentioned. |
| Software Dependencies | No | The paper is theoretical and does not describe empirical experiments. Therefore, no software dependencies with version numbers are listed. |
| Experiment Setup | No | The paper is theoretical and does not describe empirical experiments with hyperparameter tuning or training configurations. Therefore, no experimental setup details are provided. |