Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Compositional Reinforcement Learning from Logical Specifications
Authors: Kishor Jothimurugan, Suguman Bansal, Osbert Bastani, Rajeev Alur
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically evaluate1 our approach on a rooms environment (with continuous state and action spaces), where a 2D agent must navigate a set of rooms to achieve the specification, as well as a challenging fetch environment where the goal is to use a robot arm to manipulate a block to achieve the specification. We demonstrate that DIRL significantly outperforms state-of-the-art deep RL algorithms for learning policies from specifications, such as SPECTRL, TLTL, QRM and HRM, as well as a state-of-the-art hierarchical RL algorithm, R-AVI, that uses state abstractions, as the complexity of the specification increases. |
| Researcher Affiliation | Academia | Kishor Jothimurugan University of Pennsylvania Suguman Bansal University of Pennsylvania Osbert Bastani University of Pennsylvania Rajeev Alur University of Pennsylvania |
| Pseudocode | Yes | Algorithm 1 Compositional reinforcement learning algorithm for solving abstract reachability. |
| Open Source Code | Yes | Our implementation is available at https://github.com/keyshor/dirl. |
| Open Datasets | Yes | We consider the Fetch-Pick-And-Place environment in Open AI Gym [8], consisting of a robotic arm that can grasp objects and a block to manipulate. |
| Dataset Splits | No | The paper states that training details including data splits are in the supplement: 'Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] In the supplement.' However, the main text does not specify training/validation/test splits. |
| Hardware Specification | No | The paper explicitly states in its checklist: 'Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [N/A] We compare sample-efficiency which is independent of the platform on which the algorithms are run.' |
| Software Dependencies | No | The paper mentions software like ARS, TD3, Open AI Gym, SPECTRL, QRM, HRM, TLTL, and R-AVI, but does not provide specific version numbers for any of these dependencies. |
| Experiment Setup | Yes | We learn policies using ARS [32] with shaped rewards (see Appendix B); each one is a fully connected NN with 2 hidden layers of 30 neurons each. (...) We learn policies using TD3 [14] with shaped rewards; each one is a fully connected NN with 2 hidden layers of 256 neurons each. |