reproducibilityindex.ai

Programmatically Interpretable Reinforcement Learning

Authors: Abhinav Verma, Vijayaraghavan Murali, Rishabh Singh, Pushmeet Kohli, Swarat Chaudhuri

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate NDPS on the task of learning to drive a simulated car in the TORCS car-racing environment. We demonstrate that NDPS is able to discover human-readable policies that pass some signiﬁcant performance bars. We also show that PIRL policies can have smoother trajectories, and can be more easily transferred to environments not encountered during training, than corresponding policies discovered by DRL.
Researcher Affiliation	Collaboration	1Rice University 2Google Brain 3Deepmind.
Pseudocode	Yes	We show pseudocode for NDPS in Algorithm 1. The inputs to the algorithm are a POMDP M, a neural policy e N for M that serves as an oracle, and a sketch S.
Open Source Code	No	No explicit statement about code release or links to repositories are present in the paper.
Open Datasets	Yes	We use NDPS to generate controllers for cars in The Open Racing Car Simulator (TORCS) (Wymann et al., 2014).
Dataset Splits	No	The paper describes the process of generating data (histories) during policy learning and evaluation within the TORCS environment, but it does not specify explicit training, validation, and test splits of a dataset in terms of percentages or counts for reproducibility of the data partitioning.
Hardware Specification	No	The paper mentions using the TORCS environment and a DDPG network, and discusses the size of their neural network, but it does not provide specific hardware details such as GPU/CPU models, memory specifications, or types of computing infrastructure used for the experiments.
Software Dependencies	No	The paper mentions various algorithms and tools used, such as Deep Deterministic Policy Gradients (DDPG), Bayesian optimization, SMT solving, and Reluplex, but it does not provide specific version numbers for any of the software libraries, frameworks (e.g., TensorFlow, PyTorch), or programming languages used in the implementation.
Experiment Setup	Yes	In its full generality TORCS provides a rich environment with input from up to 89 sensors, and optionally the 3D graphic from a chosen camera angle in the race. The controllers have to decide the values of 5 parameters during game play, which correspond to the acceleration, brake, clutch, gear and steering of the car. [...] Here we consider the input from 29 sensors, and decide values for the acceleration and steering actions. [...] The sketches used in our experiments are as in the example in Section 2, and provide the basic structure of a proportional-integral-derivative (PID) program, with appropriate holes for parameter and observation values. To obtain a practical implementation, we constrain the fold calculation to the ﬁve latest observations of the history.