Explainable Reinforcement Learning through a Causal Lens

Authors: Prashan Madumal, Tim Miller, Liz Sonenberg, Frank Vetere2493-2500

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We computationally evaluate the model in 6 domains and measure performance and task prediction accuracy. We report on a study with 120 participants who observe agents playing a real-time strategy game (Starcraft II) and then receive explanations of the agents behaviour.
Researcher Affiliation Academia Prashan Madumal, Tim Miller, Liz Sonenberg, Frank Vetere Victoria, Australia pmathugama@student.unimelb.edu.au, {tmiller, l.sonenberg, f.vetere}@unimelb.edu.au
Pseudocode Yes Algorithm 1 Task Prediction:Action Influence Model Input: trained regression models L, current state St Output: predicted action a
Open Source Code No No explicit statement or link providing access to the authors' open-source code for the methodology was found in the paper.
Open Datasets Yes We evaluate action influence models in 5 Open AI RL benchmark domains (Brockman et al. 2016) and in the Starcraft II domain.
Dataset Splits No The paper mentions training phases for RL agents and structural equations ('training phase of the RL agent', 'time taken to train the structural equations'), but does not specify explicit train/validation/test dataset splits (e.g., percentages or counts) for the data used in these processes.
Hardware Specification No No specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running the experiments were provided in the paper. General computing environments were not specified either.
Software Dependencies No The paper mentions types of regression learners (linear SGD regression, decision tree regression, multilayer perceptron regression) and various RL algorithms (PG, DQN, SARSA, DDQN, PPO, A2C), but does not provide specific software dependencies with version numbers.
Experiment Setup No The paper describes the general process of learning structural equations, including using experience replay and updating equations using regression learners with mini-batches. However, it does not provide specific experimental setup details such as concrete hyperparameter values (e.g., learning rates, batch sizes, number of epochs) or specific optimizer settings for the models.