Contrastive Explanations for Reinforcement Learning via Embedded Self Predictions
Authors: Zhengxian Lin, Kin-Ho Lam, Alan Fern
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our case studies in three domains, including a complex strategy game, show that ESP models can be effectively learned and support insightful explanations. Below we introduce our domains and experiments, which address these questions: 1) (Section 6.2) Can we learn ESP models that perform as well as standard models? 2) (Section 6.2) Do the learned ESP models have accurate GVFs? 3) (Section 6.3) Do our explanations provide meaningful insight? |
| Researcher Affiliation | Academia | Zhengxian Lin, Kim-Ho Lam, Alan Fern Department of EECS Oregon State University {linzhe, lamki, alan.fern}@oregonstate.edu |
| Pseudocode | Yes | A ESP-DQN PSEUDO-CODE The Pseudo-code for ESP-DQN is given in Algorithm 1. Algorithm 2 ESP-Table: Pseudo-code for a table-based variant of ESP-DQN. |
| Open Source Code | Yes | The ESP agent code is provided in Supplementary Material, including pre-trained models for all domains we present. |
| Open Datasets | Yes | Lunar Lander. We use the standard Open AI Gym version of Lunar Lander... Cart Pole. We use the standard Open AI Gym Cart Pole environment... |
| Dataset Splits | No | Given a mini-batch the update to θC is based on L2 loss with a target value for sample i being yi = ri + β ˆQ (s i, ˆa i), where ˆa i = arg maxa ˆQ (s , a) is the greedy action of the target network. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'Py SC2 for Starcraft 2' and 'Open AI Gym' environments, along with 'Adam' and 'SGD' optimizers, but does not specify version numbers for these software components. |
| Experiment Setup | Yes | Table 1 gives the hyperparameters used in our implementation. Table 2 presents our GVF network structures used to train the agents in each domain. |