Regularized Gradient Descent Ascent for Two-Player Zero-Sum Markov Games
Authors: Sihan Zeng, Thinh Doan, Justin Romberg
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we complement the analysis with numerical simulations that illustrate the accelerated convergence of the algorithm. In this section, we numerically verify the convergence of Algorithm 2 on small-scale synthetic Markov games. |
| Researcher Affiliation | Academia | Sihan Zeng Dept. of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, GA 30318 szeng30@gatech.edu Thinh Doan Dept. of Electrical and Computer Engineering Virginia Tech Blacksburg, VA 24061 thinhdoan@vt.edu Justin Romberg Dept. of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, GA 30318 jrom@ece.gatech.edu |
| Pseudocode | Yes | Algorithm 1: Nested-Loop Policy Gradient Descent Ascent Algorithm with Piecewise Constant Regularization Weight; Algorithm 2: Policy Gradient Descent Ascent Algorithm with Diminishing Regularization Weight |
| Open Source Code | Yes | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] The code is in the supplementary material. |
| Open Datasets | No | The paper uses 'small-scale synthetic Markov games' and states 'we first choose the reward and transition probability kernel' for the experiments. It does not mention using or providing access to any specific publicly available dataset. |
| Dataset Splits | No | The paper describes numerical simulations on 'small-scale synthetic Markov games' but does not explicitly provide details about training, validation, or test dataset splits. |
| Hardware Specification | No | The paper states: 'Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [No] The experiment is very small-scale and the computational resource used is negligible.' Therefore, no specific hardware details are provided. |
| Software Dependencies | No | The paper does not explicitly list any software dependencies with specific version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | We run Algorithm 2 for 50000 iterations with k = 10 3, βk = 10 2, k = (k + 1) 1/3, and measure the convergence of k and φk by metrics considered in (13) and (14) of Theorem 2. |