StateMask: Explaining Deep Reinforcement Learning through State Mask
Authors: Zelei Cheng, Xian Wu, Jiahao Yu, Wenhai Sun, Wenbo Guo, Xinyu Xing
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate State Mask in various popular RL environments and show its superiority over existing explainers in explanation fidelity. We also show that State Mask has better utilities, such as launching adversarial attacks and patching policy errors. |
| Researcher Affiliation | Academia | Zelei Cheng Northwestern University Evanston, IL 60208 zelei.cheng@northwestern.eduXian Wu Northwestern University Evanston, IL 60208 xianwu2024@u.northwestern.eduJiahao Yu Northwestern University Evanston, IL 60208 jiahao.yu@northwestern.eduWenhai Sun Purdue University West Lafayette, IN 47907 sun841@purdue.eduWenbo Guo Purdue University West Lafayette, IN 47907 henrygwb@purdue.eduXinyu Xing Northwestern University Evanston, IL 60208 xinyu.xing@northwestern.edu |
| Pseudocode | Yes | Algorithm 1 The learning algorithm for training the mask net. Algorithm 1 The training algorithm of mask net. |
| Open Source Code | Yes | 2The source code of State Mask can be found in https://github.com/nuwuxian/RL-state_mask. |
| Open Datasets | Yes | We select 10 representative environments to demonstrate the effectiveness of State Mask across four types of environments: simple normal-form game (Cart Pole, Pendulum, and Pong [48]), sophisticated normal-form game (You-Shall-Not-Pass [49], Kick-And-Defend [49], and Star Craft II [50]), perfect-information (simple) extensive-form game (Connect 4, Tic-Tac-Toe and Breakthrough [51]), and imperfect-information (sophisticated) extensive-form game (Dou Dizhu [52]). |
| Dataset Splits | No | The paper does not explicitly state the training, validation, and test dataset splits using percentages, sample counts, or by referencing predefined splits with citations for all datasets used. While it mentions experiments are run for 500 rounds/trajectories, it doesn't specify data splits in the conventional sense for supervised learning datasets. |
| Hardware Specification | Yes | In our experiment, we use a server with 2 AMD EPYC 7702 64-Core CPU Processors and 4 NVIDIA RTX A6000 GPUs to train and evaluate our method. |
| Software Dependencies | No | The paper states: "We implement our training algorithm using Py Torch [9] and additionally discuss the hyperparameter setting of our training algorithm." It mentions PyTorch but does not provide a specific version number. No other software dependencies are listed with versions. |
| Experiment Setup | Yes | Regarding the clipping parameter ϵ in Eqn. (14), we set it as 0.2. As for the advantage estimation in Eqn. (15), we set γ = 0.99 and ζ = 0.95. Moreover, we provide the game-specific hyperparameter choices in Table 1. |