reproducibilityindex.ai

A Self-Play Posterior Sampling Algorithm for Zero-Sum Markov Games

Authors: Wei Xiong, Han Zhong, Chengshuai Shi, Cong Shen, Tong Zhang

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Theoretical	Theoretical analysis demonstrates that the posterior sampling algorithm admits a T-regret bound for problems with a low multi-agent decoupling coefficient, which is a new complexity measure for MGs, where T denotes the number of episodes. When specialized to linear MGs, the obtained regret bound matches the state-of-the-art results. To the best of our knowledge, this is the first provably efficient posterior sampling algorithm for MGs with frequentist regret guarantees, which enriches the toolbox for MGs and promotes the broad applicability of posterior sampling.
Researcher Affiliation	Collaboration	1The Hong Kong University of Science and Technology; 2Center for Data Science, Peking University; 3University of Virginia; 4Google Research.
Pseudocode	Yes	Algorithm 1 Conditional Posterior Sampling with Booster; Algorithm 2 Main(F, η, D, T, λ); Algorithm 3 Booster(F, η, D, µf, T, λ)
Open Source Code	No	The paper does not provide any statement or link regarding the availability of open-source code for the described methodology.
Open Datasets	No	The paper is theoretical and does not conduct experiments on specific datasets. Therefore, no information about publicly available training datasets is provided.
Dataset Splits	No	The paper is theoretical and does not conduct experiments. Therefore, no information regarding dataset splits for validation is provided.
Hardware Specification	No	The paper is theoretical and does not describe empirical experiments. Therefore, no hardware specifications are mentioned.
Software Dependencies	No	The paper is theoretical and does not describe empirical experiments. Therefore, no software dependencies with version numbers are listed.
Experiment Setup	No	The paper is theoretical and does not describe empirical experiments with hyperparameter tuning or training configurations. Therefore, no experimental setup details are provided.