Compositional Reinforcement Learning from Logical Specifications

Authors: Kishor Jothimurugan, Suguman Bansal, Osbert Bastani, Rajeev Alur

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically evaluate1 our approach on a rooms environment (with continuous state and action spaces), where a 2D agent must navigate a set of rooms to achieve the specification, as well as a challenging fetch environment where the goal is to use a robot arm to manipulate a block to achieve the specification. We demonstrate that DIRL significantly outperforms state-of-the-art deep RL algorithms for learning policies from specifications, such as SPECTRL, TLTL, QRM and HRM, as well as a state-of-the-art hierarchical RL algorithm, R-AVI, that uses state abstractions, as the complexity of the specification increases.
Researcher Affiliation Academia Kishor Jothimurugan University of Pennsylvania Suguman Bansal University of Pennsylvania Osbert Bastani University of Pennsylvania Rajeev Alur University of Pennsylvania
Pseudocode Yes Algorithm 1 Compositional reinforcement learning algorithm for solving abstract reachability.
Open Source Code Yes Our implementation is available at https://github.com/keyshor/dirl.
Open Datasets Yes We consider the Fetch-Pick-And-Place environment in Open AI Gym [8], consisting of a robotic arm that can grasp objects and a block to manipulate.
Dataset Splits No The paper states that training details including data splits are in the supplement: 'Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] In the supplement.' However, the main text does not specify training/validation/test splits.
Hardware Specification No The paper explicitly states in its checklist: 'Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [N/A] We compare sample-efficiency which is independent of the platform on which the algorithms are run.'
Software Dependencies No The paper mentions software like ARS, TD3, Open AI Gym, SPECTRL, QRM, HRM, TLTL, and R-AVI, but does not provide specific version numbers for any of these dependencies.
Experiment Setup Yes We learn policies using ARS [32] with shaped rewards (see Appendix B); each one is a fully connected NN with 2 hidden layers of 30 neurons each. (...) We learn policies using TD3 [14] with shaped rewards; each one is a fully connected NN with 2 hidden layers of 256 neurons each.