reproducibilityindex.ai

Policy Optimization for Markov Games: Unified Framework and Faster Convergence

Authors: Runyu Zhang, Qinghua Liu, Huan Wang, Caiming Xiong, Na Li, Yu Bai

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we provide a numerical example to verify our theory and investigate the importance of smooth value updates, and find that using eager value updates instead (equivalent to the independent natural policy gradient algorithm) may significantly slow down the convergence, even on a simple game with H = 2 layers. We perform numerical studies on the various policy optimization algorithms.
Researcher Affiliation	Collaboration	Runyu Zhang Harvard University runyuzhang@fas.harvard.edu Qinghua Liu Princeton University qinghual@princeton.edu Huan Wang Salesforce Research huan.wang@salesforce.com Caiming Xiong Salesforce Research cxiong@salesforce.com Na Li Harvard University nali@seas.harvard.edu Yu Bai Salesforce Research yu.bai@salesforce.com
Pseudocode	Yes	Algorithm 1 Algorithm framework for two-player zero-sum Markov Games
Open Source Code	Yes	Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See the supplemental material.
Open Datasets	No	The paper uses a 'carefully constructed zero-sum Markov game' for its numerical example and designs 'a simple zero-sum Markov game with two layers and small state/action spaces'. It does not provide access information for a publicly available or open dataset.
Dataset Splits	No	The paper designs a simulated Markov game environment and specifies the number of iterations (T) for learning, but it does not describe specific train/validation/test dataset splits with percentages, sample counts, or references to predefined splits.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details, such as library names with version numbers, needed to replicate the experiments.
Experiment Setup	Yes	We test all three algorithms above on this game, with this initialization, T {103, 3 103, 104, . . . , 107}, and η chosen correspondingly as described above.