reproducibilityindex.ai

Compositional Reinforcement Learning from Logical Specifications

Authors: Kishor Jothimurugan, Suguman Bansal, Osbert Bastani, Rajeev Alur

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically evaluate1 our approach on a rooms environment (with continuous state and action spaces), where a 2D agent must navigate a set of rooms to achieve the speciﬁcation, as well as a challenging fetch environment where the goal is to use a robot arm to manipulate a block to achieve the speciﬁcation. We demonstrate that DIRL signiﬁcantly outperforms state-of-the-art deep RL algorithms for learning policies from speciﬁcations, such as SPECTRL, TLTL, QRM and HRM, as well as a state-of-the-art hierarchical RL algorithm, R-AVI, that uses state abstractions, as the complexity of the speciﬁcation increases.
Researcher Affiliation	Academia	Kishor Jothimurugan University of Pennsylvania Suguman Bansal University of Pennsylvania Osbert Bastani University of Pennsylvania Rajeev Alur University of Pennsylvania
Pseudocode	Yes	Algorithm 1 Compositional reinforcement learning algorithm for solving abstract reachability.
Open Source Code	Yes	Our implementation is available at https://github.com/keyshor/dirl.
Open Datasets	Yes	We consider the Fetch-Pick-And-Place environment in Open AI Gym [8], consisting of a robotic arm that can grasp objects and a block to manipulate.
Dataset Splits	No	The paper states that training details including data splits are in the supplement: 'Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] In the supplement.' However, the main text does not specify training/validation/test splits.
Hardware Specification	No	The paper explicitly states in its checklist: 'Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [N/A] We compare sample-efﬁciency which is independent of the platform on which the algorithms are run.'
Software Dependencies	No	The paper mentions software like ARS, TD3, Open AI Gym, SPECTRL, QRM, HRM, TLTL, and R-AVI, but does not provide specific version numbers for any of these dependencies.
Experiment Setup	Yes	We learn policies using ARS [32] with shaped rewards (see Appendix B); each one is a fully connected NN with 2 hidden layers of 30 neurons each. (...) We learn policies using TD3 [14] with shaped rewards; each one is a fully connected NN with 2 hidden layers of 256 neurons each.