Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Policy Gradient in Robust MDPs with Global Convergence Guarantee
Authors: Qiuhao Wang, Chin Pang Ho, Marek Petrik
ICML 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, our numerical results demonstrate the utility of our new algorithm and confirm its global convergence properties. |
| Researcher Affiliation | Academia | 1School of Data Science, City University of Hong Kong 2Department of Computer Science, University of New Hampshire. |
| Pseudocode | Yes | Algorithm 1 Double-Loop Robust Policy Gradient (DRPG) |
| Open Source Code | Yes | To facilitate the reproducibility of the domains, the full source code, which was used to generate them, is available at https: //github.com/Jerrison Wang/ICML-DRPG. |
| Open Datasets | Yes | The full source code, which was used to generate them, is available at https: //github.com/Jerrison Wang/ICML-DRPG. The repository also contains CSV files with the precise specification of the RMDPs being solved. |
| Dataset Splits | No | The paper does not provide specific training, validation, or test dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | Yes | All algorithms are implemented in Python 3.8.8, and performed on a computer with an i7-11700 CPU with 16GB RAM. |
| Software Dependencies | Yes | All algorithms are implemented in Python 3.8.8... We use Gurobi 9.5.2 to solve any linear or quadratic optimization problems involved. |
| Experiment Setup | Yes | The updating step size for ξ = (θ, λ) on the inner problem are taken 0.01. For simplicity, we choose all elements of λc as one and θc := [0.4, 0.9] , and set κθ = 1, κλ = 1 in this problem. |