reproducibilityindex.ai

Policy Gradient in Robust MDPs with Global Convergence Guarantee

Authors: Qiuhao Wang, Chin Pang Ho, Marek Petrik

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, our numerical results demonstrate the utility of our new algorithm and confirm its global convergence properties.
Researcher Affiliation	Academia	1School of Data Science, City University of Hong Kong 2Department of Computer Science, University of New Hampshire.
Pseudocode	Yes	Algorithm 1 Double-Loop Robust Policy Gradient (DRPG)
Open Source Code	Yes	To facilitate the reproducibility of the domains, the full source code, which was used to generate them, is available at https: //github.com/Jerrison Wang/ICML-DRPG.
Open Datasets	Yes	The full source code, which was used to generate them, is available at https: //github.com/Jerrison Wang/ICML-DRPG. The repository also contains CSV files with the precise specification of the RMDPs being solved.
Dataset Splits	No	The paper does not provide specific training, validation, or test dataset splits (e.g., percentages or sample counts).
Hardware Specification	Yes	All algorithms are implemented in Python 3.8.8, and performed on a computer with an i7-11700 CPU with 16GB RAM.
Software Dependencies	Yes	All algorithms are implemented in Python 3.8.8... We use Gurobi 9.5.2 to solve any linear or quadratic optimization problems involved.
Experiment Setup	Yes	The updating step size for ξ = (θ, λ) on the inner problem are taken 0.01. For simplicity, we choose all elements of λc as one and θc := [0.4, 0.9] , and set κθ = 1, κλ = 1 in this problem.