Robust Policy Synthesis for Uncertain POMDPs via Convex Optimization
Authors: Marnix Suilen, Nils Jansen, Murat Cubuktepe, Ufuk Topcu
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the feasibility of our approach by means of several case studies that highlight typical bottlenecks for our problem. In particular, we show that we are able to solve benchmarks with hundreds of thousands of states, hundreds of different observations, and we investigate the effect of different levels of uncertainty in the models. We evaluate our robust synthesis procedure on benchmark examples that are subject to either reachability or expected cost specifications. |
| Researcher Affiliation | Academia | Marnix Suilen1 , Nils Jansen1 , Murat Cubuktepe2 and Ufuk Topcu2 1Department of Software Science, Radboud University, The Netherlands 2Department of Aerospace Engineering and Engineering Mechanics, University of Texas at Austin, USA |
| Pseudocode | No | The paper describes mathematical formulations and algorithmic steps conceptually but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions a 'prototype implementation' but does not provide any link or explicit statement about making their source code available. |
| Open Datasets | Yes | Grid-world robot is based on the POMDP example in [Littman et al., 1995]. The second example is a maze setting, introduced in [Mc Callum, 1993], where a robot is to reach a target location in minimal time see Fig. 2(b). Our third example concerns scheduling wireless traffic, where at each time period a scheduler generates a new packet for each user [Yang et al., 2011]. |
| Dataset Splits | No | The paper describes varying levels of uncertainty ('nominal', 'small interval', 'big interval') for its examples but does not provide explicit details on data splits like training, validation, or test sets in the typical machine learning sense. |
| Hardware Specification | Yes | The experiments were performed on a computer with an Intel Core i9-9900u 2.50 GHz processor and 64 GB of RAM |
| Software Dependencies | Yes | The experiments were performed on a computer with an Intel Core i9-9900u 2.50 GHz processor and 64 GB of RAM with Gurobi 9.0 as the QCQP solver and our own implementation of a robust value iteration. As part of a Python toolchain, we use the probabilistic model checker Storm [Dehnert et al., 2017] to extract an explicit state space representation of u POMDPs. |
| Experiment Setup | Yes | We use a 1 hour time-out (TO). For all our examples we use a standard POMDP model socalled nominal probabilities, as well as two different sizes of probability intervals, namely a small one and a big one. |