Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Neurosymbolic Reinforcement Learning with Formally Verified Exploration
Authors: Greg Anderson, Abhinav Verma, Isil Dillig, Swarat Chaudhuri
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical evaluation, on a suite of continuous control problems, shows that REVEL enforces safe exploration in many scenarios where established RL algorithms (including CPO [1], which is motivated by safe RL) do not, while discovering policies that outperform policies based on static shields. |
| Researcher Affiliation | Academia | Greg Anderson UT Austin EMAIL Abhinav Verma UT Austin EMAIL Isil Dillig UT Austin EMAIL Swarat Chaudhuri UT Austin EMAIL |
| Pseudocode | Yes | Algorithm 1 Reinforcement Learning with Formally Veri๏ฌed Exploration (REVEL) and Algorithm 2 Implementation of PROJECTG |
| Open Source Code | Yes | The current implementation is available at https://github.com/gavlegoat/safe-learning. |
| Open Datasets | No | Our experiments used 10 benchmarks that include classic control problems, robotics applications, and benchmarks from prior work [11]. For each of these environments, we hand-constructed a worst-case, piecewise linear model of the dynamics. The paper refers to benchmark environments and hand-constructed models, but does not provide concrete access information (link, DOI, or formal citation with authors/year for the specific datasets used for training) for any publicly available datasets. |
| Dataset Splits | No | The paper discusses training performance and benchmarks but does not provide specific details on dataset splits (e.g., percentages or counts for training, validation, or test sets). |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, library versions, or specific solver versions) needed to replicate the experiments. |
| Experiment Setup | Yes | Details of hyperparameters that we used appear in the Appendix. |