A Composable Specification Language for Reinforcement Learning Tasks

Authors: Kishor Jothimurugan, Rajeev Alur, Osbert Bastani

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We implement our approach in a tool called SPECTRL, and show that it outperforms several state-of-the-art baselines. ... We have implemented SPECTRL, and empirically demonstrated its benefits (Section 4).
Researcher Affiliation Academia Kishor Jothimurugan, Rajeev Alur, Osbert Bastani University of Pennsylvania {kishor,alur,obastani}@cis.upenn.edu
Pseudocode No The paper describes algorithms but does not provide structured pseudocode or algorithm blocks within the main text.
Open Source Code Yes The implementation can be found at https://github.com/keyshor/spectrl_tool.
Open Datasets Yes Finally, we applied SPECTRL to a different control task namely, to learn a policy for the version of cart-pole in Open AI Gym, in which we used continuous actions instead of discrete actions.
Dataset Splits No The paper describes learning within a simulated environment and evaluates performance based on 'probability of satisfying the specification (estimated using samples)'. It does not specify explicit training/validation/test dataset splits for a fixed dataset.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU, CPU models, memory) used to run the experiments. It describes the simulated environments and tasks.
Software Dependencies No The paper mentions using 'augmented random search (ARS)' and 'Open AI Gym', but it does not specify any software names with version numbers for reproducibility.
Experiment Setup Yes We consider a dynamical system with states S = R2 R, where (x, r) S encodes the robot position x and its remaining fuel r, actions A = [ 1, 1]2 where an action a A is the robot velocity, and transitions f(x, r, a) = (x + a + ϵ, r 0.1 |x1| a ), where ϵ N(0, σ2I) and the fuel consumed is proportional to the product of speed and distance from the y-axis. The initial state is s0 = (5, 0, 7), and the horizon is T = 40. ... each Nq has two fully connected hidden layers with 30 neurons each and Re LU activations, and tanh function as its output layer. We solve this RL problem using augmented random search (ARS)...