reproducibilityindex.ai

A Single-Loop Robust Policy Gradient Method for Robust Markov Decision Processes

Authors: Zhenwei Lin, Chenyu Xue, Qi Deng, Yinyu Ye

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical experiments validate the efficacy of SRPG, demonstrating its faster and more robust convergence behavior compared to its nested-loop counterpart. (Abstract) and We conduct several experiments to investigate the performance of SRPG compared with DRPG (Wang et al., 2023). In particular, we consider two different problems, including GARNET MDPs and an inventory management problem. (Section 5).
Researcher Affiliation	Academia	1Shanghai University of Finance and Economics 2Antai College of Economics and Management, Shanghai Jiao Tong University 3Stanford University. Correspondence to: Qi Deng <qdeng24@sjtu.edu.cn>.
Pseudocode	Yes	Algorithm 1 Single-loop Robust Policy Gradient Method
Open Source Code	Yes	We provide the code in this link.
Open Datasets	No	The paper mentions using GARNET MDPs, which are generated, and an inventory management problem, but does not provide specific access information (link, DOI, citation for a public instance) for the datasets used in their experiments. For example, 'We randomly generate the nominal transition kernel p according to two different GARNET MDPs: GARNET(5, 6, 3) and GARNET(10, 5, 10).' (Section 5.1).
Dataset Splits	No	The paper does not provide specific dataset split information (e.g., percentages or sample counts for training, validation, and test sets) for the experiments conducted.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU models, CPU models, or memory) used for running its experiments. It only mentions the use of 'the state-of-the-art commercial solver GUROBI' (Section 5).
Software Dependencies	No	The paper mentions using 'the state-of-the-art commercial solver GUROBI (Gurobi Optimization, LLC, 2023)', but it does not specify a version number for GUROBI or any other software dependencies.
Experiment Setup	Yes	We let the discount factor γ = 0.95, and sample the cost csas i.i.d. from the uniform distribution supported on [0, 5]. ... We choose the primal stepsize τ and dual stepsize σ from 0.01, 0.05, 0.1. We also choose the extrapolation parameters β and µ from 0.01, 0.05, 0.1, 0.2, 0.4 for SRPG. For DRPG, we also tune its primal and dual stepsize from 0.01, 0.05, 0.1. (Section 5.1).