Toward Policy Explanations for Multi-Agent Reinforcement Learning

Authors: Kayla Boggess, Sarit Kraus, Lu Feng

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on three MARL domains demonstrate the scalability of our methods. A user study shows that the generated explanations significantly improve user performance and increase subjective ratings on metrics such as user satisfaction.
Researcher Affiliation Academia 1University of Virginia 2Bar-Ilan University {kjb5we, lu.feng}@virginia.edu, sarit@cs.biu.ac.il
Pseudocode Yes Algorithm 1 shows the proposed method, which takes the input of a policy abstraction M and a set of predicates Fc representing the completion of tasks (subgoals) in a given MARL domain. Algorithm 2 presents both the baseline and proposed methods for answering When do agents Gq do actions Aq? .
Open Source Code No The paper mentions using a third-party implementation ('Our implementation used the Shared Experience Actor-Critic [Christianos et al., 2020]') and refers to an appendix in a related paper for pseudocode, but does not provide a direct link or explicit statement that *their* implementation code for the described methods is open-source.
Open Datasets Yes The second and third domains are benchmarks taken from [Papoudakis et al., 2021]. Multi-robot warehouse (RWARE) considers multiple robotic agents cooperatively delivering requested items. Level-based foraging (LBF) considers a mixed cooperative-competitive game where agents must navigate a grid world to collect randomly scattered food.
Dataset Splits No The paper states 'All models were trained and evaluated to 10,000 steps, or until converging to the expected reward, whichever occurred first,' but does not specify the use of a distinct validation dataset split for model tuning.
Hardware Specification Yes The experiments were run on a laptop with a 1.4 GHz Quad-Core Intel i5 processor and 8 GB RAM.
Software Dependencies No The paper states 'Our implementation used the Shared Experience Actor-Critic [Christianos et al., 2020]' but does not provide specific version numbers for software dependencies like programming languages or libraries.
Experiment Setup Yes All models were trained and evaluated to 10,000 steps, or until converging to the expected reward, whichever occurred first.