Information-Theoretic Opacity-Enforcement in Markov Decision Processes

Authors: Chongyang Shi, Yuheng Bu, Jie Fu

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Experiment Evaluation Example 1 (Grid World Example). The effectiveness of the proposed optimal opacity-enforcement planning algorithms 1 is illustrated through a stochastic grid world example shown in Figure 1.
Researcher Affiliation Academia Chongyang Shi , Yuheng Bu and Jie Fu University of Florida {c.shi, buyuheng, fujie}@ufl.edu
Pseudocode No The paper describes algorithmic steps using mathematical equations and textual explanations, but it does not include any structured pseudocode blocks or algorithm listings labeled as 'Algorithm' or 'Pseudocode'.
Open Source Code Yes 1The code of the experiment is available on https://github.com/AronYoung414/leakage_minial_design_MDP
Open Datasets No The paper uses a custom 'Grid World Example' simulation environment and does not specify a publicly available dataset with concrete access information (link, DOI, or formal citation).
Dataset Splits No The paper describes a simulation environment and experiments but does not provide specific details on dataset splits (e.g., percentages, sample counts, or predefined split citations) for training, validation, or testing.
Hardware Specification No The paper does not provide any specific hardware details (such as GPU/CPU models, memory, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., Python version, library names, or solver versions) needed to replicate the experiment.
Experiment Setup Yes We set the reward of reaching a goal to be 0.1 and the constraint that the total return is greater than or equals δ = 0.3, and the horizon T = 10. We will employ the soft-max policy parameterization, i.e., πθ(a|s) = exp(θs,a) / Σa' A exp(θs,a' ), where θ R|S A| is the policy parameter vector. As P1 enters these sensor ranges, the observer receives corresponding observations ( b , r , y , g , respectively) with probability p = 0.9 and a null observation ( 0 ) with probability 1 p = 0.1, attributed to the false negative rate of the sensors.