Policy Gradient in Robust MDPs with Global Convergence Guarantee
Authors: Qiuhao Wang, Chin Pang Ho, Marek Petrik
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, our numerical results demonstrate the utility of our new algorithm and confirm its global convergence properties. |
| Researcher Affiliation | Academia | 1School of Data Science, City University of Hong Kong 2Department of Computer Science, University of New Hampshire. |
| Pseudocode | Yes | Algorithm 1 Double-Loop Robust Policy Gradient (DRPG) |
| Open Source Code | Yes | To facilitate the reproducibility of the domains, the full source code, which was used to generate them, is available at https: //github.com/Jerrison Wang/ICML-DRPG. |
| Open Datasets | Yes | The full source code, which was used to generate them, is available at https: //github.com/Jerrison Wang/ICML-DRPG. The repository also contains CSV files with the precise specification of the RMDPs being solved. |
| Dataset Splits | No | The paper does not provide specific training, validation, or test dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | Yes | All algorithms are implemented in Python 3.8.8, and performed on a computer with an i7-11700 CPU with 16GB RAM. |
| Software Dependencies | Yes | All algorithms are implemented in Python 3.8.8... We use Gurobi 9.5.2 to solve any linear or quadratic optimization problems involved. |
| Experiment Setup | Yes | The updating step size for ξ = (θ, λ) on the inner problem are taken 0.01. For simplicity, we choose all elements of λc as one and θc := [0.4, 0.9] , and set κθ = 1, κλ = 1 in this problem. |