Neurosymbolic Reinforcement Learning with Formally Verified Exploration

Authors: Greg Anderson, Abhinav Verma, Isil Dillig, Swarat Chaudhuri

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical evaluation, on a suite of continuous control problems, shows that REVEL enforces safe exploration in many scenarios where established RL algorithms (including CPO [1], which is motivated by safe RL) do not, while discovering policies that outperform policies based on static shields.
Researcher Affiliation Academia Greg Anderson UT Austin ganderso@cs.utexas.edu Abhinav Verma UT Austin verma@utexas.edu Isil Dillig UT Austin isil@cs.utexas.edu Swarat Chaudhuri UT Austin swarat@cs.utexas.edu
Pseudocode Yes Algorithm 1 Reinforcement Learning with Formally Verified Exploration (REVEL) and Algorithm 2 Implementation of PROJECTG
Open Source Code Yes The current implementation is available at https://github.com/gavlegoat/safe-learning.
Open Datasets No Our experiments used 10 benchmarks that include classic control problems, robotics applications, and benchmarks from prior work [11]. For each of these environments, we hand-constructed a worst-case, piecewise linear model of the dynamics. The paper refers to benchmark environments and hand-constructed models, but does not provide concrete access information (link, DOI, or formal citation with authors/year for the specific datasets used for training) for any publicly available datasets.
Dataset Splits No The paper discusses training performance and benchmarks but does not provide specific details on dataset splits (e.g., percentages or counts for training, validation, or test sets).
Hardware Specification No The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, library versions, or specific solver versions) needed to replicate the experiments.
Experiment Setup Yes Details of hyperparameters that we used appear in the Appendix.