Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Information-Theoretic Opacity-Enforcement in Markov Decision Processes

Authors: Chongyang Shi, Yuheng Bu, Jie Fu

IJCAI 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Experiment Evaluation Example 1 (Grid World Example). The effectiveness of the proposed optimal opacity-enforcement planning algorithms 1 is illustrated through a stochastic grid world example shown in Figure 1.
Researcher Affiliation Academia Chongyang Shi , Yuheng Bu and Jie Fu University of Florida EMAIL
Pseudocode No The paper describes algorithmic steps using mathematical equations and textual explanations, but it does not include any structured pseudocode blocks or algorithm listings labeled as 'Algorithm' or 'Pseudocode'.
Open Source Code Yes 1The code of the experiment is available on https://github.com/AronYoung414/leakage_minial_design_MDP
Open Datasets No The paper uses a custom 'Grid World Example' simulation environment and does not specify a publicly available dataset with concrete access information (link, DOI, or formal citation).
Dataset Splits No The paper describes a simulation environment and experiments but does not provide specific details on dataset splits (e.g., percentages, sample counts, or predefined split citations) for training, validation, or testing.
Hardware Specification No The paper does not provide any specific hardware details (such as GPU/CPU models, memory, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., Python version, library names, or solver versions) needed to replicate the experiment.
Experiment Setup Yes We set the reward of reaching a goal to be 0.1 and the constraint that the total return is greater than or equals δ = 0.3, and the horizon T = 10. We will employ the soft-max policy parameterization, i.e., πθ(a|s) = exp(θs,a) / Σa' A exp(θs,a' ), where θ R|S A| is the policy parameter vector. As P1 enters these sensor ranges, the observer receives corresponding observations ( b , r , y , g , respectively) with probability p = 0.9 and a null observation ( 0 ) with probability 1 p = 0.1, attributed to the false negative rate of the sensors.