reproducibilityindex.ai

Robust Satisficing MDPs

Authors: Haolin Ruan, Siyu Zhou, Zhi Chen, Chin Pang Ho

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results demonstrate that RSMDPs can prescribe policies to achieve their targets, which are much higher than the optimal worst-case returns computed by robust MDPs. Moreover, the average and percentile performances of our model are competitive among other models. We also demonstrate the scalability of the proposed algorithm compared with a state-of-the-art commercial solver.
Researcher Affiliation	Academia	1School of Data Science, City University of Hong Kong 2The City University of Hong Kong Shenzhen Research Institute 3CUHK Business School, The Chinese University of Hong Kong.
Pseudocode	Yes	Algorithm 1 Primal-Dual Algorithm (PDA) for Problem (9) ... Algorithm 2 Interval Search Algorithm for Problem (21) ... Algorithm 3 Interval Search Algorithm for Problem (11) ... Algorithm 4 Solve the inner minimization problem in (26)
Open Source Code	Yes	The code and data to reproduce our experiments is available online at https://github.com/RUANHaolin/RSMDPs.
Open Datasets	Yes	We compare the performances of the proposed RSMDPs with NMDPs, RMDPs and DRMDPs in three applications: river swim (Strehl & Littman, 2008), machine replacement (Delage & Mannor, 2010) and grid world (Ghavamzadeh et al., 2016)
Dataset Splits	No	The paper mentions that parameters are selected by 'cross validation' but does not specify concrete dataset splits (e.g., percentages or counts) for training, validation, and testing.
Hardware Specification	Yes	All optimization problems are solved on an Intel 3.6 GHz processor with 32GB RAM.
Software Dependencies	Yes	To solve RSMDPs, we design a first-order method that is more scalable than the Gurobi solver (Gurobi Optimization, LLC, 2022) ... and Mosek (academic license) is utilized to solve the inner minimization problem in each iteration (MOSEK Ap S, 2022).
Experiment Setup	Yes	In our experiments, we synthetically generate random RSMDP instances, and the details of the experiments and parameters can be found in the Appendix D.7 ... We set the discount factor γ = 0.95, and the target τ = 0.85ZN.