reproducibilityindex.ai

Regularized Gradient Descent Ascent for Two-Player Zero-Sum Markov Games

Authors: Sihan Zeng, Thinh Doan, Justin Romberg

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we complement the analysis with numerical simulations that illustrate the accelerated convergence of the algorithm. In this section, we numerically verify the convergence of Algorithm 2 on small-scale synthetic Markov games.
Researcher Affiliation	Academia	Sihan Zeng Dept. of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, GA 30318 szeng30@gatech.edu Thinh Doan Dept. of Electrical and Computer Engineering Virginia Tech Blacksburg, VA 24061 thinhdoan@vt.edu Justin Romberg Dept. of Electrical and Computer Engineering Georgia Institute of Technology Atlanta, GA 30318 jrom@ece.gatech.edu
Pseudocode	Yes	Algorithm 1: Nested-Loop Policy Gradient Descent Ascent Algorithm with Piecewise Constant Regularization Weight; Algorithm 2: Policy Gradient Descent Ascent Algorithm with Diminishing Regularization Weight
Open Source Code	Yes	Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] The code is in the supplementary material.
Open Datasets	No	The paper uses 'small-scale synthetic Markov games' and states 'we first choose the reward and transition probability kernel' for the experiments. It does not mention using or providing access to any specific publicly available dataset.
Dataset Splits	No	The paper describes numerical simulations on 'small-scale synthetic Markov games' but does not explicitly provide details about training, validation, or test dataset splits.
Hardware Specification	No	The paper states: 'Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [No] The experiment is very small-scale and the computational resource used is negligible.' Therefore, no specific hardware details are provided.
Software Dependencies	No	The paper does not explicitly list any software dependencies with specific version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup	Yes	We run Algorithm 2 for 50000 iterations with k = 10 3, βk = 10 2, k = (k + 1) 1/3, and measure the convergence of k and φk by metrics considered in (13) and (14) of Theorem 2.