reproducibilityindex.ai

Policy-Conditioned Uncertainty Sets for Robust Markov Decision Processes

Authors: Andrea Tirinzoni, Marek Petrik, Xiangli Chen, Brian Ziebart

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this work, we propose non-rectangular uncertainty sets that bound marginal moments of state-action features deﬁned over entire trajectories through a decision process. This enables generalization to different portions of the state space while retaining appropriate uncertainty of the decision process. We develop algorithms for solving the resulting robust decision problems, which reduce to ﬁnding an optimal policy for a mixture of decision processes, and demonstrate the beneﬁts of our approach experimentally.
Researcher Affiliation	Collaboration	Andrea Tirinzoni Politecnico di Milano andrea.tirinzoni@polimi.it Xiangli Chen Amazon Robotics cxiangli@amazon.com Marek Petrik University of New Hampshire mpetrik@cs.unh.edu Brian D. Ziebart University of Illinois at Chicago bziebart@uic.edu
Pseudocode	Yes	Algorithm 1 Min-max Dynamic Programming
Open Source Code	No	The paper does not provide any explicit statements about releasing source code for the methodology or a link to a code repository.
Open Datasets	No	The paper mentions collecting '50 trajectories under a uniform reference policy' for the gridworld and using a 'state-space model with exponential dynamics adapted from Chapter 5 of [30]' for the invasive species, but it does not provide concrete access, a link, or a citation to a publicly available dataset used for training. The collected trajectories are not shared.
Dataset Splits	No	The paper does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or references to predefined splits).
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, or specific solvers with their versions).
Experiment Setup	Yes	The hyperparameters were chosen as follows: λ = 1.0, α = 1000, η = 10 3, and T = 2N. The Lagrange multipliers ω were initialized to zero vectors. We use Nb = 200 belief states, uniformly discretized. In all experiments we used the same hyperparameters as in the gridworld example except for λ = 0.001.