Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Neurosymbolic Reinforcement Learning with Formally Verified Exploration
Authors: Greg Anderson, Abhinav Verma, Isil Dillig, Swarat Chaudhuri
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical evaluation, on a suite of continuous control problems, shows that REVEL enforces safe exploration in many scenarios where established RL algorithms (including CPO [1], which is motivated by safe RL) do not, while discovering policies that outperform policies based on static shields. |
| Researcher Affiliation | Academia | Greg Anderson UT Austin EMAIL Abhinav Verma UT Austin EMAIL Isil Dillig UT Austin EMAIL Swarat Chaudhuri UT Austin EMAIL |
| Pseudocode | Yes | Algorithm 1 Reinforcement Learning with Formally Veri๏ฌed Exploration (REVEL) and Algorithm 2 Implementation of PROJECTG |
| Open Source Code | Yes | The current implementation is available at https://github.com/gavlegoat/safe-learning. |
| Open Datasets | No | Our experiments used 10 benchmarks that include classic control problems, robotics applications, and benchmarks from prior work [11]. For each of these environments, we hand-constructed a worst-case, piecewise linear model of the dynamics. The paper refers to benchmark environments and hand-constructed models, but does not provide concrete access information (link, DOI, or formal citation with authors/year for the specific datasets used for training) for any publicly available datasets. |
| Dataset Splits | No | The paper discusses training performance and benchmarks but does not provide specific details on dataset splits (e.g., percentages or counts for training, validation, or test sets). |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, library versions, or specific solver versions) needed to replicate the experiments. |
| Experiment Setup | Yes | Details of hyperparameters that we used appear in the Appendix. |