Robust Satisficing MDPs

Authors: Haolin Ruan, Siyu Zhou, Zhi Chen, Chin Pang Ho

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that RSMDPs can prescribe policies to achieve their targets, which are much higher than the optimal worst-case returns computed by robust MDPs. Moreover, the average and percentile performances of our model are competitive among other models. We also demonstrate the scalability of the proposed algorithm compared with a state-of-the-art commercial solver.
Researcher Affiliation Academia 1School of Data Science, City University of Hong Kong 2The City University of Hong Kong Shenzhen Research Institute 3CUHK Business School, The Chinese University of Hong Kong.
Pseudocode Yes Algorithm 1 Primal-Dual Algorithm (PDA) for Problem (9) ... Algorithm 2 Interval Search Algorithm for Problem (21) ... Algorithm 3 Interval Search Algorithm for Problem (11) ... Algorithm 4 Solve the inner minimization problem in (26)
Open Source Code Yes The code and data to reproduce our experiments is available online at https://github.com/RUANHaolin/RSMDPs.
Open Datasets Yes We compare the performances of the proposed RSMDPs with NMDPs, RMDPs and DRMDPs in three applications: river swim (Strehl & Littman, 2008), machine replacement (Delage & Mannor, 2010) and grid world (Ghavamzadeh et al., 2016)
Dataset Splits No The paper mentions that parameters are selected by 'cross validation' but does not specify concrete dataset splits (e.g., percentages or counts) for training, validation, and testing.
Hardware Specification Yes All optimization problems are solved on an Intel 3.6 GHz processor with 32GB RAM.
Software Dependencies Yes To solve RSMDPs, we design a first-order method that is more scalable than the Gurobi solver (Gurobi Optimization, LLC, 2022) ... and Mosek (academic license) is utilized to solve the inner minimization problem in each iteration (MOSEK Ap S, 2022).
Experiment Setup Yes In our experiments, we synthetically generate random RSMDP instances, and the details of the experiments and parameters can be found in the Appendix D.7 ... We set the discount factor γ = 0.95, and the target τ = 0.85ZN.