Finding Safe Zones of Markov Decision Processes Policies

Authors: Lee Cohen, Yishay Mansour, Michal Moshkovitz

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental While the main focus of this work is the introduction of the problem and the aforementioned theoretical guarantees, we do demonstrate the problem empirically, to provide additional intuition to the readers. In Section 5, we compare the performance of the naive approaches to FINDING SAFEZONE and show that different policies might lead to completely different SAFEZONES.
Researcher Affiliation Collaboration Lee Cohen TTI-Chicago Yishay Mansour Tel-Aviv University Google Research Michal Moshkovitz Bosch Center for AI
Pseudocode Yes Algorithm 1 FINDING SAFEZONE... Algorithm 2 Est Safety Subroutine... Algorithm 3 Greedy by Threshold... Algorithm 4 Simulation Algorithm... Algorithm 5 Greedy at Each Step
Open Source Code No The paper does not provide an explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets No The paper uses a simulated grid-world environment defined by the authors and does not refer to a pre-existing publicly available dataset. It describes the environment's parameters (e.g., "grid of size N N"), but this does not constitute a publicly accessible dataset in the conventional sense required by the question.
Dataset Splits No The paper mentions evaluating on a "test set containing 2000 random trajectories" but does not specify a separate validation split or the methodology for creating training/validation/test splits from a dataset. It defines the experimental environment directly.
Hardware Specification No The paper does not specify any hardware details (e.g., GPU models, CPU types, memory) used for running the experiments. It only mentions the simulated environment settings.
Software Dependencies No The paper does not provide specific version numbers for any software components, libraries, or frameworks used in its implementation or experiments.
Experiment Setup Yes The MDP. We focus on a grid of size N N, for some parameter N. The agent starts off at mid-left state, (0, N / 2) and wishes to reach the (absorbing) goal state at (N 1, N / 2) with a minimal number of steps. At each step, it can take one of four actions: up, down, right, and left by 1 grid square. With probability 0.9, the intended action is performed and with probability 0.1 there is a drift down. The agent stops either way after H = 300 steps... We take N = 30 and 2000 episodes.