Policy Gradient in Robust MDPs with Global Convergence Guarantee

Authors: Qiuhao Wang, Chin Pang Ho, Marek Petrik

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, our numerical results demonstrate the utility of our new algorithm and confirm its global convergence properties.
Researcher Affiliation Academia 1School of Data Science, City University of Hong Kong 2Department of Computer Science, University of New Hampshire.
Pseudocode Yes Algorithm 1 Double-Loop Robust Policy Gradient (DRPG)
Open Source Code Yes To facilitate the reproducibility of the domains, the full source code, which was used to generate them, is available at https: //github.com/Jerrison Wang/ICML-DRPG.
Open Datasets Yes The full source code, which was used to generate them, is available at https: //github.com/Jerrison Wang/ICML-DRPG. The repository also contains CSV files with the precise specification of the RMDPs being solved.
Dataset Splits No The paper does not provide specific training, validation, or test dataset splits (e.g., percentages or sample counts).
Hardware Specification Yes All algorithms are implemented in Python 3.8.8, and performed on a computer with an i7-11700 CPU with 16GB RAM.
Software Dependencies Yes All algorithms are implemented in Python 3.8.8... We use Gurobi 9.5.2 to solve any linear or quadratic optimization problems involved.
Experiment Setup Yes The updating step size for ξ = (θ, λ) on the inner problem are taken 0.01. For simplicity, we choose all elements of λc as one and θc := [0.4, 0.9] , and set κθ = 1, κλ = 1 in this problem.