Explainable Reinforcement Learning via Model Transforms

Authors: Mira Finkelstein, Nitsan levy, Lucy Liu, Yoav Kolumbus, David C. Parkes, Jeffrey S Rosenschein, Sarah Keren

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the approach on a set of standard benchmarks.
Researcher Affiliation Collaboration 1 The Hebrew University of Jerusalem, Benin School of Computer Science and Engineering 2 Harvard University, School of Engineering and Applied Sciences 3 Deep Mind 4 Technion Israel Institute of Technology, Taub Faculty of Computer Science
Pseudocode No The paper describes methods such as Dijkstra-like search and policy updates in text, but it does not include a formal pseudocode block or an algorithm figure.
Open Source Code Yes Our complete dataset and code can be found at https://github.com/sarah-keren/RLPE.git
Open Datasets Yes Environments: We conducted experiments with 12 different environments, including both deterministic and stochastic domains and single and multi-agent domains (see Figure 3). Frozen Lake [33] represents a stochastic grid navigation task... As demonstrated in Example 1, Taxi is an extension of the similar Open-AI domain (which in turn is based on [15])... We also used seven PDDLGym domains [34]: Sokoban, Blocks World, Towers of Hanoi, Snake, Rearrangement, Triangle Tireworld, and Exploding Blocks.
Dataset Splits No The paper mentions training agents for a certain number of episodes but does not specify explicit dataset splits (e.g., percentages or counts) for training, validation, or testing subsets of the data itself. It focuses on the training process within the environments.
Hardware Specification Yes Experiments were run on a cluster using six CPUs, each with four cores and 16GB RAM.
Software Dependencies No For the single-agent settings, we used DQN [36], CEM [39], and SARSA [23] from the keras-rl library7, as well as Q-learning [40]. For the multi-agent domains, we used PPO [37] from keras-rl. The paper mentions software and libraries like keras-rl and PDDLGym but does not provide specific version numbers for these dependencies.
Experiment Setup Yes Agents were trained for 600,000 – 1,000,000 episodes in each environment, with a maximum of 60 steps per episode. We used five paramterized transform types: state space reduction [29], likely outcome relaxation [29], precondition relaxation [22], all outcome determinization (for stochastic domains) [41], and delete relaxation [9]. We limited the depth of the search tree to three.