Policy Optimization for Markov Games: Unified Framework and Faster Convergence
Authors: Runyu Zhang, Qinghua Liu, Huan Wang, Caiming Xiong, Na Li, Yu Bai
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we provide a numerical example to verify our theory and investigate the importance of smooth value updates, and find that using eager value updates instead (equivalent to the independent natural policy gradient algorithm) may significantly slow down the convergence, even on a simple game with H = 2 layers. We perform numerical studies on the various policy optimization algorithms. |
| Researcher Affiliation | Collaboration | Runyu Zhang Harvard University runyuzhang@fas.harvard.edu Qinghua Liu Princeton University qinghual@princeton.edu Huan Wang Salesforce Research huan.wang@salesforce.com Caiming Xiong Salesforce Research cxiong@salesforce.com Na Li Harvard University nali@seas.harvard.edu Yu Bai Salesforce Research yu.bai@salesforce.com |
| Pseudocode | Yes | Algorithm 1 Algorithm framework for two-player zero-sum Markov Games |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See the supplemental material. |
| Open Datasets | No | The paper uses a 'carefully constructed zero-sum Markov game' for its numerical example and designs 'a simple zero-sum Markov game with two layers and small state/action spaces'. It does not provide access information for a publicly available or open dataset. |
| Dataset Splits | No | The paper designs a simulated Markov game environment and specifies the number of iterations (T) for learning, but it does not describe specific train/validation/test dataset splits with percentages, sample counts, or references to predefined splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library names with version numbers, needed to replicate the experiments. |
| Experiment Setup | Yes | We test all three algorithms above on this game, with this initialization, T {103, 3 103, 104, . . . , 107}, and η chosen correspondingly as described above. |