Fast Algorithms for $L_\infty$-constrained S-rectangular Robust MDPs
Authors: Bahram Behzadian, Marek Petrik, Chin Pang Ho
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results confirm the practical viability of our method and show that it outperforms a leading commercial optimization package by several orders of magnitude. |
| Researcher Affiliation | Academia | Bahram Behzadian University of New Hampshire bahram@cs.unh.edu Marek Petrik University of New Hampshire mpetrik@cs.unh.edu Chin Pang Ho City University of Hong Kong clint.ho@cityu.edu.hk |
| Pseudocode | Yes | Algorithm 1 Homotopy method to compute q(ξ) ... Algorithm 2 Bisection method for solving (7). |
| Open Source Code | No | The paper states that the algorithms are implemented in C++, but it does not provide any link or explicit statement about making the source code publicly available. |
| Open Datasets | Yes | As the main benchmark problem, we use the classic Inventory Management (IM) problem (Zipkin, 2000). To evaluate the performance of our methods on small problems, we also consider the River Swim (RS) domain (Strehl and Littman, 2008) and the Machine Replacement (MR) domain (Delage and Ye, 2010). |
| Dataset Splits | No | The paper describes experimental evaluation on standard MDP domains for runtime performance, but it does not specify training, validation, or test dataset splits as typically found in supervised machine learning contexts, as it concerns algorithm performance on a defined problem rather than model generalization from data. |
| Hardware Specification | Yes | The results were generated on a computer with an Intel i7-9700 CPU with 32 GB RAM; the algorithms are implemented in C++. |
| Software Dependencies | Yes | This section compares the empirical runtime of Algorithms 1 and 2 with the runtime of Gurobi 9.1, a leading LP solver. |
| Experiment Setup | Yes | The x-axis represents the number of states (maximum holding capacity) in the domain. The number of actions is the same as the number of states. We use the robustness budget κ = 1.2, but the computation time is insensitive to the particular choice of κ. |