Information-Theoretic Opacity-Enforcement in Markov Decision Processes
Authors: Chongyang Shi, Yuheng Bu, Jie Fu
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Experiment Evaluation Example 1 (Grid World Example). The effectiveness of the proposed optimal opacity-enforcement planning algorithms 1 is illustrated through a stochastic grid world example shown in Figure 1. |
| Researcher Affiliation | Academia | Chongyang Shi , Yuheng Bu and Jie Fu University of Florida {c.shi, buyuheng, fujie}@ufl.edu |
| Pseudocode | No | The paper describes algorithmic steps using mathematical equations and textual explanations, but it does not include any structured pseudocode blocks or algorithm listings labeled as 'Algorithm' or 'Pseudocode'. |
| Open Source Code | Yes | 1The code of the experiment is available on https://github.com/AronYoung414/leakage_minial_design_MDP |
| Open Datasets | No | The paper uses a custom 'Grid World Example' simulation environment and does not specify a publicly available dataset with concrete access information (link, DOI, or formal citation). |
| Dataset Splits | No | The paper describes a simulation environment and experiments but does not provide specific details on dataset splits (e.g., percentages, sample counts, or predefined split citations) for training, validation, or testing. |
| Hardware Specification | No | The paper does not provide any specific hardware details (such as GPU/CPU models, memory, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., Python version, library names, or solver versions) needed to replicate the experiment. |
| Experiment Setup | Yes | We set the reward of reaching a goal to be 0.1 and the constraint that the total return is greater than or equals δ = 0.3, and the horizon T = 10. We will employ the soft-max policy parameterization, i.e., πθ(a|s) = exp(θs,a) / Σa' A exp(θs,a' ), where θ R|S A| is the policy parameter vector. As P1 enters these sensor ranges, the observer receives corresponding observations ( b , r , y , g , respectively) with probability p = 0.9 and a null observation ( 0 ) with probability 1 p = 0.1, attributed to the false negative rate of the sensors. |