Explainable Reinforcement Learning via Model Transforms
Authors: Mira Finkelstein, Nitsan levy, Lucy Liu, Yoav Kolumbus, David C. Parkes, Jeffrey S Rosenschein, Sarah Keren
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the approach on a set of standard benchmarks. |
| Researcher Affiliation | Collaboration | 1 The Hebrew University of Jerusalem, Benin School of Computer Science and Engineering 2 Harvard University, School of Engineering and Applied Sciences 3 Deep Mind 4 Technion Israel Institute of Technology, Taub Faculty of Computer Science |
| Pseudocode | No | The paper describes methods such as Dijkstra-like search and policy updates in text, but it does not include a formal pseudocode block or an algorithm figure. |
| Open Source Code | Yes | Our complete dataset and code can be found at https://github.com/sarah-keren/RLPE.git |
| Open Datasets | Yes | Environments: We conducted experiments with 12 different environments, including both deterministic and stochastic domains and single and multi-agent domains (see Figure 3). Frozen Lake [33] represents a stochastic grid navigation task... As demonstrated in Example 1, Taxi is an extension of the similar Open-AI domain (which in turn is based on [15])... We also used seven PDDLGym domains [34]: Sokoban, Blocks World, Towers of Hanoi, Snake, Rearrangement, Triangle Tireworld, and Exploding Blocks. |
| Dataset Splits | No | The paper mentions training agents for a certain number of episodes but does not specify explicit dataset splits (e.g., percentages or counts) for training, validation, or testing subsets of the data itself. It focuses on the training process within the environments. |
| Hardware Specification | Yes | Experiments were run on a cluster using six CPUs, each with four cores and 16GB RAM. |
| Software Dependencies | No | For the single-agent settings, we used DQN [36], CEM [39], and SARSA [23] from the keras-rl library7, as well as Q-learning [40]. For the multi-agent domains, we used PPO [37] from keras-rl. The paper mentions software and libraries like keras-rl and PDDLGym but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | Agents were trained for 600,000 – 1,000,000 episodes in each environment, with a maximum of 60 steps per episode. We used five paramterized transform types: state space reduction [29], likely outcome relaxation [29], precondition relaxation [22], all outcome determinization (for stochastic domains) [41], and delete relaxation [9]. We limited the depth of the search tree to three. |