reproducibilityindex.ai

Efficient Value Iteration for s-rectangular Robust Markov Decision Processes

Authors: Navdeep Kumar, Kaixin Wang, Kfir Yehuda Levy, Shie Mannor

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We summarize our contributions as follows with a summary in Table 1: ...Our methods evaluate the optimal robust Bellman operators for s-rectangular uncertainty sets constrained by Lp norms, with time complexity of O(S2A + SA log A) for p = 1, 2 and O(S2A + SA log A ϵ ) for general p, which is faster than existing methods by at least a factor of S, which leads to efficient value iteration which can be used in large state space problems. ...Table 4 and Figure 1 demonstrate the relative cost (time) of robust value iteration compared to non-robust MDP for randomly generated kernel and reward functions with varying numbers of states S and actions A. The Appendix H shows more results and details.
Researcher Affiliation	Collaboration	1Technion, Haifa, Israel 2Nvidia, Haifa, Israel.
Pseudocode	Yes	Algorithm 1 s-rectangular L2 robust Bellman operator (see algorithm 1 of (Anava & Levy, 2016) ... Algorithm 2 Online s-rectangular Lp robust value iteration ... Algorithm 3 Model Based Q-Learning Algorithm for sa-rectangular Lp Robust MDP ... Algorithm 4 Model Based Algorithm for S rectangular Lp Robust MDP ... Algorithm 5 Model based algorithm for s-rectangular L1 robust MDPs
Open Source Code	No	The codes and results can be found at https://github.com/******.
Open Datasets	No	The paper mentions "randomly generated kernel and reward functions" for experiments, implying synthetic data, but does not provide access information for a specific public or open dataset.
Dataset Splits	No	The paper describes using "randomly generated kernel and reward functions" but does not specify any training, validation, or test dataset splits.
Hardware Specification	Yes	The experiments are done on the following hardware: Intel(R) Core(TM) i5-4300U CPU @ 1.90GHz 64 bits, memory 7862Mi B
Software Dependencies	No	Software: Experiments were done in Python, using numpy, scipy.optimize.linprog for Linear programming for policy evaluation in s/sa rectangular robust MDPs, scipy.optimize.minize, and scipy.optimize.Linear Constraints for policy improvement in s-rectangular L1 robust MDPs.
Experiment Setup	Yes	Experiments parameters Number of states S (variable), number of actions A (variable), transition kernel and reward function generated randomly, discount factor 0.9, uncertainty radiuses =0.1 (for all states and action, just for convenience ), number of iterations = 100, tolerance for binary search = 0.00001