Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

HypRL: Reinforcement Learning of Control Policies for Hyperproperties

Authors: Tzu-Han Hsu, Arshia Rafieioskouei, Borzoo Bonakdarpour

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate HYPRL on a diverse set of benchmarks, including safety-aware planning, Deep Sea Treasure, and the Post Correspondence Problem. We also compare with specification-driven baselines to demonstrate the effectiveness and efficiency of HYPRL.
Researcher Affiliation Academia Tzu-Han Hsu Arshia Rafieioskouei Borzoo Bonakdarpour Michigan State University EMAIL
Pseudocode No The paper describes algorithmic details in Section 5 ('Algorithmic Details of HYPRL') through prose and mathematical formulations (e.g., Equations 1, 2, 3), and provides a comprehensive example in Appendix C. However, it does not present a distinct, structured pseudocode or algorithm block with a clear label such as 'Algorithm 1'.
Open Source Code No NeurIPS Paper Checklist, Question 5: 'Open access to data and code'. Answer: [No]. Justification: 'Our implementation can be reproduced by following the detailed algorithmic steps in Section 5 and the elaborations of experimental setups for each case in the Appendix.'
Open Datasets Yes We evaluate HYPRL on a diverse set of benchmarks, including safety-aware planning, Deep Sea Treasure, and the Post Correspondence Problem. ... For DST, we use the reward function from [42]; ... For SRL, we use the maps from [35]... This case study (see Figure 9) was originally proposed by [42]. We adopt the implementation provided in [46].
Dataset Splits No The paper does not explicitly provide training/test/validation dataset splits with percentages, sample counts, or citations to predefined splits. It describes experimental setups for simulated environments, including episode lengths and environment sizes (e.g., 'Each episode consists of 300 steps', 'We conduct our experiments on 3x3 and 5x5 grid-world environments'), but not specific dataset splits for reproduction.
Hardware Specification Yes All experiments are ran on an Apple M1 Max (10-core CPU, 24-core GPU).
Software Dependencies No The optimal tuple of policies π 1, . . . , π n are learned iteratively using a selected RL algorithm such as DQN [36], PPO [39], or CQ-Learning [15] (Section 5.3). ... For our experiments, we used PPO and DQN implementations from [38]. ... The implementation and hyperparameters of CQ-Learning are taken from [17]. The paper mentions specific algorithms and references papers where their implementations can be found, but it does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes D.2 Safe RL (SRL): We employ DQN as our learning algorithm, utilizing a neural network with three hidden layers of 512 nodes and Re LU activation functions. We set the discount factor to γ = 1.0, the learning rate to 0.001, the initial ϵ to 1.0 with a decay rate of 0.995 down to a minimum of 0.01, and use the Adam optimizer. ... D.3 Deep Sea Treasure (DST): For DQN, we set the hyperparameters as follows: discount factor γ = 0.99, ϵ decaying from 1.0 to 0.07 over a fraction of 0.2, and a learning rate of 0.0004. ... For PPO, we set hyperparameters as following, γ = 0.99, clipping factor to 0.2, learning rate to 0.0003, and GAE lambda to 0.98.