reproducibilityindex.ai

Explainable Reinforcement Learning via Model Transforms

Authors: Mira Finkelstein, Nitsan levy, Lucy Liu, Yoav Kolumbus, David C. Parkes, Jeffrey S Rosenschein, Sarah Keren

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the approach on a set of standard benchmarks.
Researcher Affiliation	Collaboration	1 The Hebrew University of Jerusalem, Benin School of Computer Science and Engineering 2 Harvard University, School of Engineering and Applied Sciences 3 Deep Mind 4 Technion Israel Institute of Technology, Taub Faculty of Computer Science
Pseudocode	No	The paper describes methods such as Dijkstra-like search and policy updates in text, but it does not include a formal pseudocode block or an algorithm figure.
Open Source Code	Yes	Our complete dataset and code can be found at https://github.com/sarah-keren/RLPE.git
Open Datasets	Yes	Environments: We conducted experiments with 12 different environments, including both deterministic and stochastic domains and single and multi-agent domains (see Figure 3). Frozen Lake [33] represents a stochastic grid navigation task... As demonstrated in Example 1, Taxi is an extension of the similar Open-AI domain (which in turn is based on [15])... We also used seven PDDLGym domains [34]: Sokoban, Blocks World, Towers of Hanoi, Snake, Rearrangement, Triangle Tireworld, and Exploding Blocks.
Dataset Splits	No	The paper mentions training agents for a certain number of episodes but does not specify explicit dataset splits (e.g., percentages or counts) for training, validation, or testing subsets of the data itself. It focuses on the training process within the environments.
Hardware Specification	Yes	Experiments were run on a cluster using six CPUs, each with four cores and 16GB RAM.
Software Dependencies	No	For the single-agent settings, we used DQN [36], CEM [39], and SARSA [23] from the keras-rl library7, as well as Q-learning [40]. For the multi-agent domains, we used PPO [37] from keras-rl. The paper mentions software and libraries like keras-rl and PDDLGym but does not provide specific version numbers for these dependencies.
Experiment Setup	Yes	Agents were trained for 600,000 – 1,000,000 episodes in each environment, with a maximum of 60 steps per episode. We used five paramterized transform types: state space reduction [29], likely outcome relaxation [29], precondition relaxation [22], all outcome determinization (for stochastic domains) [41], and delete relaxation [9]. We limited the depth of the search tree to three.