reproducibilityindex.ai

Information-Theoretic Opacity-Enforcement in Markov Decision Processes

Authors: Chongyang Shi, Yuheng Bu, Jie Fu

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 Experiment Evaluation Example 1 (Grid World Example). The effectiveness of the proposed optimal opacity-enforcement planning algorithms 1 is illustrated through a stochastic grid world example shown in Figure 1.
Researcher Affiliation	Academia	Chongyang Shi , Yuheng Bu and Jie Fu University of Florida {c.shi, buyuheng, fujie}@ufl.edu
Pseudocode	No	The paper describes algorithmic steps using mathematical equations and textual explanations, but it does not include any structured pseudocode blocks or algorithm listings labeled as 'Algorithm' or 'Pseudocode'.
Open Source Code	Yes	1The code of the experiment is available on https://github.com/AronYoung414/leakage_minial_design_MDP
Open Datasets	No	The paper uses a custom 'Grid World Example' simulation environment and does not specify a publicly available dataset with concrete access information (link, DOI, or formal citation).
Dataset Splits	No	The paper describes a simulation environment and experiments but does not provide specific details on dataset splits (e.g., percentages, sample counts, or predefined split citations) for training, validation, or testing.
Hardware Specification	No	The paper does not provide any specific hardware details (such as GPU/CPU models, memory, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers (e.g., Python version, library names, or solver versions) needed to replicate the experiment.
Experiment Setup	Yes	We set the reward of reaching a goal to be 0.1 and the constraint that the total return is greater than or equals δ = 0.3, and the horizon T = 10. We will employ the soft-max policy parameterization, i.e., πθ(a\|s) = exp(θs,a) / Σa' A exp(θs,a' ), where θ R\|S A\| is the policy parameter vector. As P1 enters these sensor ranges, the observer receives corresponding observations ( b , r , y , g , respectively) with probability p = 0.9 and a null observation ( 0 ) with probability 1 p = 0.1, attributed to the false negative rate of the sensors.