Policy-Conditioned Uncertainty Sets for Robust Markov Decision Processes
Authors: Andrea Tirinzoni, Marek Petrik, Xiangli Chen, Brian Ziebart
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work, we propose non-rectangular uncertainty sets that bound marginal moments of state-action features defined over entire trajectories through a decision process. This enables generalization to different portions of the state space while retaining appropriate uncertainty of the decision process. We develop algorithms for solving the resulting robust decision problems, which reduce to finding an optimal policy for a mixture of decision processes, and demonstrate the benefits of our approach experimentally. |
| Researcher Affiliation | Collaboration | Andrea Tirinzoni Politecnico di Milano andrea.tirinzoni@polimi.it Xiangli Chen Amazon Robotics cxiangli@amazon.com Marek Petrik University of New Hampshire mpetrik@cs.unh.edu Brian D. Ziebart University of Illinois at Chicago bziebart@uic.edu |
| Pseudocode | Yes | Algorithm 1 Min-max Dynamic Programming |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code for the methodology or a link to a code repository. |
| Open Datasets | No | The paper mentions collecting '50 trajectories under a uniform reference policy' for the gridworld and using a 'state-space model with exponential dynamics adapted from Chapter 5 of [30]' for the invasive species, but it does not provide concrete access, a link, or a citation to a publicly available dataset used for training. The collected trajectories are not shared. |
| Dataset Splits | No | The paper does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or references to predefined splits). |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, or specific solvers with their versions). |
| Experiment Setup | Yes | The hyperparameters were chosen as follows: λ = 1.0, α = 1000, η = 10 3, and T = 2N. The Lagrange multipliers ω were initialized to zero vectors. We use Nb = 200 belief states, uniformly discretized. In all experiments we used the same hyperparameters as in the gridworld example except for λ = 0.001. |