reproducibilityindex.ai

Finding Safe Zones of Markov Decision Processes Policies

Authors: Lee Cohen, Yishay Mansour, Michal Moshkovitz

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	While the main focus of this work is the introduction of the problem and the aforementioned theoretical guarantees, we do demonstrate the problem empirically, to provide additional intuition to the readers. In Section 5, we compare the performance of the naive approaches to FINDING SAFEZONE and show that different policies might lead to completely different SAFEZONES.
Researcher Affiliation	Collaboration	Lee Cohen TTI-Chicago Yishay Mansour Tel-Aviv University Google Research Michal Moshkovitz Bosch Center for AI
Pseudocode	Yes	Algorithm 1 FINDING SAFEZONE... Algorithm 2 Est Safety Subroutine... Algorithm 3 Greedy by Threshold... Algorithm 4 Simulation Algorithm... Algorithm 5 Greedy at Each Step
Open Source Code	No	The paper does not provide an explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets	No	The paper uses a simulated grid-world environment defined by the authors and does not refer to a pre-existing publicly available dataset. It describes the environment's parameters (e.g., "grid of size N N"), but this does not constitute a publicly accessible dataset in the conventional sense required by the question.
Dataset Splits	No	The paper mentions evaluating on a "test set containing 2000 random trajectories" but does not specify a separate validation split or the methodology for creating training/validation/test splits from a dataset. It defines the experimental environment directly.
Hardware Specification	No	The paper does not specify any hardware details (e.g., GPU models, CPU types, memory) used for running the experiments. It only mentions the simulated environment settings.
Software Dependencies	No	The paper does not provide specific version numbers for any software components, libraries, or frameworks used in its implementation or experiments.
Experiment Setup	Yes	The MDP. We focus on a grid of size N N, for some parameter N. The agent starts off at mid-left state, (0, N / 2) and wishes to reach the (absorbing) goal state at (N 1, N / 2) with a minimal number of steps. At each step, it can take one of four actions: up, down, right, and left by 1 grid square. With probability 0.9, the intended action is performed and with probability 0.1 there is a drift down. The agent stops either way after H = 300 steps... We take N = 30 and 2000 episodes.