Fast Bellman Updates for Robust MDPs

Authors: Chin Pang Ho, Marek Petrik, Wolfram Wiesemann

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results indicate that the proposed methods are over 1,000 times faster than Gurobi, a state-of-the-art commercial optimization package, for small instances, and the performance gap grows considerably with problem size.
Researcher Affiliation Academia 1Imperial College London 2Department of Computer Science, University of New Hampshire.
Pseudocode Yes Algorithm 1: Homotopy method for q(ξ). Algorithm 2: Bisection algorithm to solve (7)
Open Source Code Yes All algorithms were implemented in C++ and the code is available from the publications section of http://cs.unh.edu/ mpetrik.
Open Datasets No The paper uses randomly generated data and a classic problem (inventory management) whose parameters are defined within the paper, rather than explicitly using or providing access to a pre-existing public dataset.
Dataset Splits No The paper does not provide specific details regarding training, validation, or test dataset splits.
Hardware Specification Yes All results were generated on a PC with i7-6700 3.4 GHz CPU with 32 GB RAM.
Software Dependencies Yes We compare the proposed methods with Gurobi 7.5, a stateof-the-art commercial LP solver. All algorithms were implemented in C++
Experiment Setup Yes We vary the number of states from 50 to 400 in increments of 50. Since the L1 norm distance between two distributions is always between 0 and 2, we consider κ {0.0, 0.25, 0.5, 0.75, . . . , 1.75, 2.0} and report average performances over those sizes. The same κ values are used for the weighted L1 norm although they may be greater than 2. The results are averaged over 5 runs. The holding cost, purchase cost, and sale price are 0.1, 1.0, and 1.6, respectively. There are no backlogs, and the inventory is limited by the number of states S. The demand is sampled from a normal distribution with mean S/2 and standard deviation S/5. The weights for the L1 norm are set to wi = 10/ pi (as suggested for the L2 norm in Iyengar (2005)) and clamped to the interval [0.3, 3.0]. The initial state is 0 (no inventory), and the value function is linear with slope 1.