Neurosymbolic Reinforcement Learning with Formally Verified Exploration
Authors: Greg Anderson, Abhinav Verma, Isil Dillig, Swarat Chaudhuri
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical evaluation, on a suite of continuous control problems, shows that REVEL enforces safe exploration in many scenarios where established RL algorithms (including CPO [1], which is motivated by safe RL) do not, while discovering policies that outperform policies based on static shields. |
| Researcher Affiliation | Academia | Greg Anderson UT Austin ganderso@cs.utexas.edu Abhinav Verma UT Austin verma@utexas.edu Isil Dillig UT Austin isil@cs.utexas.edu Swarat Chaudhuri UT Austin swarat@cs.utexas.edu |
| Pseudocode | Yes | Algorithm 1 Reinforcement Learning with Formally Veriļ¬ed Exploration (REVEL) and Algorithm 2 Implementation of PROJECTG |
| Open Source Code | Yes | The current implementation is available at https://github.com/gavlegoat/safe-learning. |
| Open Datasets | No | Our experiments used 10 benchmarks that include classic control problems, robotics applications, and benchmarks from prior work [11]. For each of these environments, we hand-constructed a worst-case, piecewise linear model of the dynamics. The paper refers to benchmark environments and hand-constructed models, but does not provide concrete access information (link, DOI, or formal citation with authors/year for the specific datasets used for training) for any publicly available datasets. |
| Dataset Splits | No | The paper discusses training performance and benchmarks but does not provide specific details on dataset splits (e.g., percentages or counts for training, validation, or test sets). |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, library versions, or specific solver versions) needed to replicate the experiments. |
| Experiment Setup | Yes | Details of hyperparameters that we used appear in the Appendix. |