reproducibilityindex.ai

Toward Policy Explanations for Multi-Agent Reinforcement Learning

Authors: Kayla Boggess, Sarit Kraus, Lu Feng

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on three MARL domains demonstrate the scalability of our methods. A user study shows that the generated explanations significantly improve user performance and increase subjective ratings on metrics such as user satisfaction.
Researcher Affiliation	Academia	1University of Virginia 2Bar-Ilan University {kjb5we, lu.feng}@virginia.edu, sarit@cs.biu.ac.il
Pseudocode	Yes	Algorithm 1 shows the proposed method, which takes the input of a policy abstraction M and a set of predicates Fc representing the completion of tasks (subgoals) in a given MARL domain. Algorithm 2 presents both the baseline and proposed methods for answering When do agents Gq do actions Aq? .
Open Source Code	No	The paper mentions using a third-party implementation ('Our implementation used the Shared Experience Actor-Critic [Christianos et al., 2020]') and refers to an appendix in a related paper for pseudocode, but does not provide a direct link or explicit statement that their implementation code for the described methods is open-source.
Open Datasets	Yes	The second and third domains are benchmarks taken from [Papoudakis et al., 2021]. Multi-robot warehouse (RWARE) considers multiple robotic agents cooperatively delivering requested items. Level-based foraging (LBF) considers a mixed cooperative-competitive game where agents must navigate a grid world to collect randomly scattered food.
Dataset Splits	No	The paper states 'All models were trained and evaluated to 10,000 steps, or until converging to the expected reward, whichever occurred first,' but does not specify the use of a distinct validation dataset split for model tuning.
Hardware Specification	Yes	The experiments were run on a laptop with a 1.4 GHz Quad-Core Intel i5 processor and 8 GB RAM.
Software Dependencies	No	The paper states 'Our implementation used the Shared Experience Actor-Critic [Christianos et al., 2020]' but does not provide specific version numbers for software dependencies like programming languages or libraries.
Experiment Setup	Yes	All models were trained and evaluated to 10,000 steps, or until converging to the expected reward, whichever occurred first.