Fast Algorithms for $L_\infty$-constrained S-rectangular Robust MDPs

Authors: Bahram Behzadian, Marek Petrik, Chin Pang Ho

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results confirm the practical viability of our method and show that it outperforms a leading commercial optimization package by several orders of magnitude.
Researcher Affiliation Academia Bahram Behzadian University of New Hampshire bahram@cs.unh.edu Marek Petrik University of New Hampshire mpetrik@cs.unh.edu Chin Pang Ho City University of Hong Kong clint.ho@cityu.edu.hk
Pseudocode Yes Algorithm 1 Homotopy method to compute q(ξ) ... Algorithm 2 Bisection method for solving (7).
Open Source Code No The paper states that the algorithms are implemented in C++, but it does not provide any link or explicit statement about making the source code publicly available.
Open Datasets Yes As the main benchmark problem, we use the classic Inventory Management (IM) problem (Zipkin, 2000). To evaluate the performance of our methods on small problems, we also consider the River Swim (RS) domain (Strehl and Littman, 2008) and the Machine Replacement (MR) domain (Delage and Ye, 2010).
Dataset Splits No The paper describes experimental evaluation on standard MDP domains for runtime performance, but it does not specify training, validation, or test dataset splits as typically found in supervised machine learning contexts, as it concerns algorithm performance on a defined problem rather than model generalization from data.
Hardware Specification Yes The results were generated on a computer with an Intel i7-9700 CPU with 32 GB RAM; the algorithms are implemented in C++.
Software Dependencies Yes This section compares the empirical runtime of Algorithms 1 and 2 with the runtime of Gurobi 9.1, a leading LP solver.
Experiment Setup Yes The x-axis represents the number of states (maximum holding capacity) in the domain. The number of actions is the same as the number of states. We use the robustness budget κ = 1.2, but the computation time is insensitive to the particular choice of κ.