Compositional Reinforcement Learning from Logical Specifications
Authors: Kishor Jothimurugan, Suguman Bansal, Osbert Bastani, Rajeev Alur
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically evaluate1 our approach on a rooms environment (with continuous state and action spaces), where a 2D agent must navigate a set of rooms to achieve the specification, as well as a challenging fetch environment where the goal is to use a robot arm to manipulate a block to achieve the specification. We demonstrate that DIRL significantly outperforms state-of-the-art deep RL algorithms for learning policies from specifications, such as SPECTRL, TLTL, QRM and HRM, as well as a state-of-the-art hierarchical RL algorithm, R-AVI, that uses state abstractions, as the complexity of the specification increases. |
| Researcher Affiliation | Academia | Kishor Jothimurugan University of Pennsylvania Suguman Bansal University of Pennsylvania Osbert Bastani University of Pennsylvania Rajeev Alur University of Pennsylvania |
| Pseudocode | Yes | Algorithm 1 Compositional reinforcement learning algorithm for solving abstract reachability. |
| Open Source Code | Yes | Our implementation is available at https://github.com/keyshor/dirl. |
| Open Datasets | Yes | We consider the Fetch-Pick-And-Place environment in Open AI Gym [8], consisting of a robotic arm that can grasp objects and a block to manipulate. |
| Dataset Splits | No | The paper states that training details including data splits are in the supplement: 'Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] In the supplement.' However, the main text does not specify training/validation/test splits. |
| Hardware Specification | No | The paper explicitly states in its checklist: 'Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [N/A] We compare sample-efficiency which is independent of the platform on which the algorithms are run.' |
| Software Dependencies | No | The paper mentions software like ARS, TD3, Open AI Gym, SPECTRL, QRM, HRM, TLTL, and R-AVI, but does not provide specific version numbers for any of these dependencies. |
| Experiment Setup | Yes | We learn policies using ARS [32] with shaped rewards (see Appendix B); each one is a fully connected NN with 2 hidden layers of 30 neurons each. (...) We learn policies using TD3 [14] with shaped rewards; each one is a fully connected NN with 2 hidden layers of 256 neurons each. |