Policy-Conditioned Uncertainty Sets for Robust Markov Decision Processes

Authors: Andrea Tirinzoni, Marek Petrik, Xiangli Chen, Brian Ziebart

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work, we propose non-rectangular uncertainty sets that bound marginal moments of state-action features defined over entire trajectories through a decision process. This enables generalization to different portions of the state space while retaining appropriate uncertainty of the decision process. We develop algorithms for solving the resulting robust decision problems, which reduce to finding an optimal policy for a mixture of decision processes, and demonstrate the benefits of our approach experimentally.
Researcher Affiliation Collaboration Andrea Tirinzoni Politecnico di Milano andrea.tirinzoni@polimi.it Xiangli Chen Amazon Robotics cxiangli@amazon.com Marek Petrik University of New Hampshire mpetrik@cs.unh.edu Brian D. Ziebart University of Illinois at Chicago bziebart@uic.edu
Pseudocode Yes Algorithm 1 Min-max Dynamic Programming
Open Source Code No The paper does not provide any explicit statements about releasing source code for the methodology or a link to a code repository.
Open Datasets No The paper mentions collecting '50 trajectories under a uniform reference policy' for the gridworld and using a 'state-space model with exponential dynamics adapted from Chapter 5 of [30]' for the invasive species, but it does not provide concrete access, a link, or a citation to a publicly available dataset used for training. The collected trajectories are not shared.
Dataset Splits No The paper does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or references to predefined splits).
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, or specific solvers with their versions).
Experiment Setup Yes The hyperparameters were chosen as follows: λ = 1.0, α = 1000, η = 10 3, and T = 2N. The Lagrange multipliers ω were initialized to zero vectors. We use Nb = 200 belief states, uniformly discretized. In all experiments we used the same hyperparameters as in the gridworld example except for λ = 0.001.