reproducibilityindex.ai

A Composable Specification Language for Reinforcement Learning Tasks

Authors: Kishor Jothimurugan, Rajeev Alur, Osbert Bastani

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We implement our approach in a tool called SPECTRL, and show that it outperforms several state-of-the-art baselines. ... We have implemented SPECTRL, and empirically demonstrated its beneﬁts (Section 4).
Researcher Affiliation	Academia	Kishor Jothimurugan, Rajeev Alur, Osbert Bastani University of Pennsylvania {kishor,alur,obastani}@cis.upenn.edu
Pseudocode	No	The paper describes algorithms but does not provide structured pseudocode or algorithm blocks within the main text.
Open Source Code	Yes	The implementation can be found at https://github.com/keyshor/spectrl_tool.
Open Datasets	Yes	Finally, we applied SPECTRL to a different control task namely, to learn a policy for the version of cart-pole in Open AI Gym, in which we used continuous actions instead of discrete actions.
Dataset Splits	No	The paper describes learning within a simulated environment and evaluates performance based on 'probability of satisfying the speciﬁcation (estimated using samples)'. It does not specify explicit training/validation/test dataset splits for a fixed dataset.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU, CPU models, memory) used to run the experiments. It describes the simulated environments and tasks.
Software Dependencies	No	The paper mentions using 'augmented random search (ARS)' and 'Open AI Gym', but it does not specify any software names with version numbers for reproducibility.
Experiment Setup	Yes	We consider a dynamical system with states S = R2 R, where (x, r) S encodes the robot position x and its remaining fuel r, actions A = [ 1, 1]2 where an action a A is the robot velocity, and transitions f(x, r, a) = (x + a + ϵ, r 0.1 \|x1\| a ), where ϵ N(0, σ2I) and the fuel consumed is proportional to the product of speed and distance from the y-axis. The initial state is s0 = (5, 0, 7), and the horizon is T = 40. ... each Nq has two fully connected hidden layers with 30 neurons each and Re LU activations, and tanh function as its output layer. We solve this RL problem using augmented random search (ARS)...