Efficient Value Iteration for s-rectangular Robust Markov Decision Processes
Authors: Navdeep Kumar, Kaixin Wang, Kfir Yehuda Levy, Shie Mannor
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We summarize our contributions as follows with a summary in Table 1: ...Our methods evaluate the optimal robust Bellman operators for s-rectangular uncertainty sets constrained by Lp norms, with time complexity of O(S2A + SA log A) for p = 1, 2 and O(S2A + SA log A ϵ ) for general p, which is faster than existing methods by at least a factor of S, which leads to efficient value iteration which can be used in large state space problems. ...Table 4 and Figure 1 demonstrate the relative cost (time) of robust value iteration compared to non-robust MDP for randomly generated kernel and reward functions with varying numbers of states S and actions A. The Appendix H shows more results and details. |
| Researcher Affiliation | Collaboration | 1Technion, Haifa, Israel 2Nvidia, Haifa, Israel. |
| Pseudocode | Yes | Algorithm 1 s-rectangular L2 robust Bellman operator (see algorithm 1 of (Anava & Levy, 2016) ... Algorithm 2 Online s-rectangular Lp robust value iteration ... Algorithm 3 Model Based Q-Learning Algorithm for sa-rectangular Lp Robust MDP ... Algorithm 4 Model Based Algorithm for S rectangular Lp Robust MDP ... Algorithm 5 Model based algorithm for s-rectangular L1 robust MDPs |
| Open Source Code | No | The codes and results can be found at https://github.com/******. |
| Open Datasets | No | The paper mentions "randomly generated kernel and reward functions" for experiments, implying synthetic data, but does not provide access information for a specific public or open dataset. |
| Dataset Splits | No | The paper describes using "randomly generated kernel and reward functions" but does not specify any training, validation, or test dataset splits. |
| Hardware Specification | Yes | The experiments are done on the following hardware: Intel(R) Core(TM) i5-4300U CPU @ 1.90GHz 64 bits, memory 7862Mi B |
| Software Dependencies | No | Software: Experiments were done in Python, using numpy, scipy.optimize.linprog for Linear programming for policy evaluation in s/sa rectangular robust MDPs, scipy.optimize.minize, and scipy.optimize.Linear Constraints for policy improvement in s-rectangular L1 robust MDPs. |
| Experiment Setup | Yes | Experiments parameters Number of states S (variable), number of actions A (variable), transition kernel and reward function generated randomly, discount factor 0.9, uncertainty radiuses =0.1 (for all states and action, just for convenience ), number of iterations = 100, tolerance for binary search = 0.00001 |