Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Efficient Value Iteration for s-rectangular Robust Markov Decision Processes
Authors: Navdeep Kumar, Kaixin Wang, Kfir Yehuda Levy, Shie Mannor
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We summarize our contributions as follows with a summary in Table 1: ...Our methods evaluate the optimal robust Bellman operators for s-rectangular uncertainty sets constrained by Lp norms, with time complexity of O(S2A + SA log A) for p = 1, 2 and O(S2A + SA log A ϵ ) for general p, which is faster than existing methods by at least a factor of S, which leads to efficient value iteration which can be used in large state space problems. ...Table 4 and Figure 1 demonstrate the relative cost (time) of robust value iteration compared to non-robust MDP for randomly generated kernel and reward functions with varying numbers of states S and actions A. The Appendix H shows more results and details. |
| Researcher Affiliation | Collaboration | 1Technion, Haifa, Israel 2Nvidia, Haifa, Israel. |
| Pseudocode | Yes | Algorithm 1 s-rectangular L2 robust Bellman operator (see algorithm 1 of (Anava & Levy, 2016) ... Algorithm 2 Online s-rectangular Lp robust value iteration ... Algorithm 3 Model Based Q-Learning Algorithm for sa-rectangular Lp Robust MDP ... Algorithm 4 Model Based Algorithm for S rectangular Lp Robust MDP ... Algorithm 5 Model based algorithm for s-rectangular L1 robust MDPs |
| Open Source Code | No | The codes and results can be found at https://github.com/******. |
| Open Datasets | No | The paper mentions "randomly generated kernel and reward functions" for experiments, implying synthetic data, but does not provide access information for a specific public or open dataset. |
| Dataset Splits | No | The paper describes using "randomly generated kernel and reward functions" but does not specify any training, validation, or test dataset splits. |
| Hardware Specification | Yes | The experiments are done on the following hardware: Intel(R) Core(TM) i5-4300U CPU @ 1.90GHz 64 bits, memory 7862Mi B |
| Software Dependencies | No | Software: Experiments were done in Python, using numpy, scipy.optimize.linprog for Linear programming for policy evaluation in s/sa rectangular robust MDPs, scipy.optimize.minize, and scipy.optimize.Linear Constraints for policy improvement in s-rectangular L1 robust MDPs. |
| Experiment Setup | Yes | Experiments parameters Number of states S (variable), number of actions A (variable), transition kernel and reward function generated randomly, discount factor 0.9, uncertainty radiuses =0.1 (for all states and action, just for convenience ), number of iterations = 100, tolerance for binary search = 0.00001 |