Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Contrastive Explanations for Reinforcement Learning via Embedded Self Predictions
Authors: Zhengxian Lin, Kin-Ho Lam, Alan Fern
ICLR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our case studies in three domains, including a complex strategy game, show that ESP models can be effectively learned and support insightful explanations. Below we introduce our domains and experiments, which address these questions: 1) (Section 6.2) Can we learn ESP models that perform as well as standard models? 2) (Section 6.2) Do the learned ESP models have accurate GVFs? 3) (Section 6.3) Do our explanations provide meaningful insight? |
| Researcher Affiliation | Academia | Zhengxian Lin, Kim-Ho Lam, Alan Fern Department of EECS Oregon State University EMAIL |
| Pseudocode | Yes | A ESP-DQN PSEUDO-CODE The Pseudo-code for ESP-DQN is given in Algorithm 1. Algorithm 2 ESP-Table: Pseudo-code for a table-based variant of ESP-DQN. |
| Open Source Code | Yes | The ESP agent code is provided in Supplementary Material, including pre-trained models for all domains we present. |
| Open Datasets | Yes | Lunar Lander. We use the standard Open AI Gym version of Lunar Lander... Cart Pole. We use the standard Open AI Gym Cart Pole environment... |
| Dataset Splits | No | Given a mini-batch the update to θC is based on L2 loss with a target value for sample i being yi = ri + β ˆQ (s i, ˆa i), where ˆa i = arg maxa ˆQ (s , a) is the greedy action of the target network. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'Py SC2 for Starcraft 2' and 'Open AI Gym' environments, along with 'Adam' and 'SGD' optimizers, but does not specify version numbers for these software components. |
| Experiment Setup | Yes | Table 1 gives the hyperparameters used in our implementation. Table 2 presents our GVF network structures used to train the agents in each domain. |