reproducibilityindex.ai

Contrastive Explanations for Reinforcement Learning via Embedded Self Predictions

Authors: Zhengxian Lin, Kin-Ho Lam, Alan Fern

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our case studies in three domains, including a complex strategy game, show that ESP models can be effectively learned and support insightful explanations. Below we introduce our domains and experiments, which address these questions: 1) (Section 6.2) Can we learn ESP models that perform as well as standard models? 2) (Section 6.2) Do the learned ESP models have accurate GVFs? 3) (Section 6.3) Do our explanations provide meaningful insight?
Researcher Affiliation	Academia	Zhengxian Lin, Kim-Ho Lam, Alan Fern Department of EECS Oregon State University {linzhe, lamki, alan.fern}@oregonstate.edu
Pseudocode	Yes	A ESP-DQN PSEUDO-CODE The Pseudo-code for ESP-DQN is given in Algorithm 1. Algorithm 2 ESP-Table: Pseudo-code for a table-based variant of ESP-DQN.
Open Source Code	Yes	The ESP agent code is provided in Supplementary Material, including pre-trained models for all domains we present.
Open Datasets	Yes	Lunar Lander. We use the standard Open AI Gym version of Lunar Lander... Cart Pole. We use the standard Open AI Gym Cart Pole environment...
Dataset Splits	No	Given a mini-batch the update to θC is based on L2 loss with a target value for sample i being yi = ri + β ˆQ (s i, ˆa i), where ˆa i = arg maxa ˆQ (s , a) is the greedy action of the target network.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using 'Py SC2 for Starcraft 2' and 'Open AI Gym' environments, along with 'Adam' and 'SGD' optimizers, but does not specify version numbers for these software components.
Experiment Setup	Yes	Table 1 gives the hyperparameters used in our implementation. Table 2 presents our GVF network structures used to train the agents in each domain.