Efficient Value Iteration for s-rectangular Robust Markov Decision Processes

Authors: Navdeep Kumar, Kaixin Wang, Kfir Yehuda Levy, Shie Mannor

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We summarize our contributions as follows with a summary in Table 1: ...Our methods evaluate the optimal robust Bellman operators for s-rectangular uncertainty sets constrained by Lp norms, with time complexity of O(S2A + SA log A) for p = 1, 2 and O(S2A + SA log A ϵ ) for general p, which is faster than existing methods by at least a factor of S, which leads to efficient value iteration which can be used in large state space problems. ...Table 4 and Figure 1 demonstrate the relative cost (time) of robust value iteration compared to non-robust MDP for randomly generated kernel and reward functions with varying numbers of states S and actions A. The Appendix H shows more results and details.
Researcher Affiliation Collaboration 1Technion, Haifa, Israel 2Nvidia, Haifa, Israel.
Pseudocode Yes Algorithm 1 s-rectangular L2 robust Bellman operator (see algorithm 1 of (Anava & Levy, 2016) ... Algorithm 2 Online s-rectangular Lp robust value iteration ... Algorithm 3 Model Based Q-Learning Algorithm for sa-rectangular Lp Robust MDP ... Algorithm 4 Model Based Algorithm for S rectangular Lp Robust MDP ... Algorithm 5 Model based algorithm for s-rectangular L1 robust MDPs
Open Source Code No The codes and results can be found at https://github.com/******.
Open Datasets No The paper mentions "randomly generated kernel and reward functions" for experiments, implying synthetic data, but does not provide access information for a specific public or open dataset.
Dataset Splits No The paper describes using "randomly generated kernel and reward functions" but does not specify any training, validation, or test dataset splits.
Hardware Specification Yes The experiments are done on the following hardware: Intel(R) Core(TM) i5-4300U CPU @ 1.90GHz 64 bits, memory 7862Mi B
Software Dependencies No Software: Experiments were done in Python, using numpy, scipy.optimize.linprog for Linear programming for policy evaluation in s/sa rectangular robust MDPs, scipy.optimize.minize, and scipy.optimize.Linear Constraints for policy improvement in s-rectangular L1 robust MDPs.
Experiment Setup Yes Experiments parameters Number of states S (variable), number of actions A (variable), transition kernel and reward function generated randomly, discount factor 0.9, uncertainty radiuses =0.1 (for all states and action, just for convenience ), number of iterations = 100, tolerance for binary search = 0.00001