StateMask: Explaining Deep Reinforcement Learning through State Mask

Authors: Zelei Cheng, Xian Wu, Jiahao Yu, Wenhai Sun, Wenbo Guo, Xinyu Xing

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate State Mask in various popular RL environments and show its superiority over existing explainers in explanation fidelity. We also show that State Mask has better utilities, such as launching adversarial attacks and patching policy errors.
Researcher Affiliation Academia Zelei Cheng Northwestern University Evanston, IL 60208 zelei.cheng@northwestern.eduXian Wu Northwestern University Evanston, IL 60208 xianwu2024@u.northwestern.eduJiahao Yu Northwestern University Evanston, IL 60208 jiahao.yu@northwestern.eduWenhai Sun Purdue University West Lafayette, IN 47907 sun841@purdue.eduWenbo Guo Purdue University West Lafayette, IN 47907 henrygwb@purdue.eduXinyu Xing Northwestern University Evanston, IL 60208 xinyu.xing@northwestern.edu
Pseudocode Yes Algorithm 1 The learning algorithm for training the mask net. Algorithm 1 The training algorithm of mask net.
Open Source Code Yes 2The source code of State Mask can be found in https://github.com/nuwuxian/RL-state_mask.
Open Datasets Yes We select 10 representative environments to demonstrate the effectiveness of State Mask across four types of environments: simple normal-form game (Cart Pole, Pendulum, and Pong [48]), sophisticated normal-form game (You-Shall-Not-Pass [49], Kick-And-Defend [49], and Star Craft II [50]), perfect-information (simple) extensive-form game (Connect 4, Tic-Tac-Toe and Breakthrough [51]), and imperfect-information (sophisticated) extensive-form game (Dou Dizhu [52]).
Dataset Splits No The paper does not explicitly state the training, validation, and test dataset splits using percentages, sample counts, or by referencing predefined splits with citations for all datasets used. While it mentions experiments are run for 500 rounds/trajectories, it doesn't specify data splits in the conventional sense for supervised learning datasets.
Hardware Specification Yes In our experiment, we use a server with 2 AMD EPYC 7702 64-Core CPU Processors and 4 NVIDIA RTX A6000 GPUs to train and evaluate our method.
Software Dependencies No The paper states: "We implement our training algorithm using Py Torch [9] and additionally discuss the hyperparameter setting of our training algorithm." It mentions PyTorch but does not provide a specific version number. No other software dependencies are listed with versions.
Experiment Setup Yes Regarding the clipping parameter ϵ in Eqn. (14), we set it as 0.2. As for the advantage estimation in Eqn. (15), we set γ = 0.99 and ζ = 0.95. Moreover, we provide the game-specific hyperparameter choices in Table 1.