Programmatically Interpretable Reinforcement Learning
Authors: Abhinav Verma, Vijayaraghavan Murali, Rishabh Singh, Pushmeet Kohli, Swarat Chaudhuri
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate NDPS on the task of learning to drive a simulated car in the TORCS car-racing environment. We demonstrate that NDPS is able to discover human-readable policies that pass some significant performance bars. We also show that PIRL policies can have smoother trajectories, and can be more easily transferred to environments not encountered during training, than corresponding policies discovered by DRL. |
| Researcher Affiliation | Collaboration | 1Rice University 2Google Brain 3Deepmind. |
| Pseudocode | Yes | We show pseudocode for NDPS in Algorithm 1. The inputs to the algorithm are a POMDP M, a neural policy e N for M that serves as an oracle, and a sketch S. |
| Open Source Code | No | No explicit statement about code release or links to repositories are present in the paper. |
| Open Datasets | Yes | We use NDPS to generate controllers for cars in The Open Racing Car Simulator (TORCS) (Wymann et al., 2014). |
| Dataset Splits | No | The paper describes the process of generating data (histories) during policy learning and evaluation within the TORCS environment, but it does not specify explicit training, validation, and test *splits* of a *dataset* in terms of percentages or counts for reproducibility of the data partitioning. |
| Hardware Specification | No | The paper mentions using the TORCS environment and a DDPG network, and discusses the size of their neural network, but it does not provide specific hardware details such as GPU/CPU models, memory specifications, or types of computing infrastructure used for the experiments. |
| Software Dependencies | No | The paper mentions various algorithms and tools used, such as Deep Deterministic Policy Gradients (DDPG), Bayesian optimization, SMT solving, and Reluplex, but it does not provide specific version numbers for any of the software libraries, frameworks (e.g., TensorFlow, PyTorch), or programming languages used in the implementation. |
| Experiment Setup | Yes | In its full generality TORCS provides a rich environment with input from up to 89 sensors, and optionally the 3D graphic from a chosen camera angle in the race. The controllers have to decide the values of 5 parameters during game play, which correspond to the acceleration, brake, clutch, gear and steering of the car. [...] Here we consider the input from 29 sensors, and decide values for the acceleration and steering actions. [...] The sketches used in our experiments are as in the example in Section 2, and provide the basic structure of a proportional-integral-derivative (PID) program, with appropriate holes for parameter and observation values. To obtain a practical implementation, we constrain the fold calculation to the five latest observations of the history. |