reproducibilityindex.ai

Multi-Agent Adversarial Inverse Reinforcement Learning

Authors: Lantao Yu, Jiaming Song, Stefano Ermon

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5. Experiments We seek to answer the following questions via empirical evaluation: (1) Can MA-AIRL efﬁciently recover the expert policies for each individual agent from the expert demonstrations (policy imitation)? (2) Can MA-AIRL effectively recover the underlying reward functions, for which the expert policies form a LSBRE (reward recovery)? ... The results for cooperative and competitive environments are shown in Tables 1 and 2 respectively.
Researcher Affiliation	Academia	Lantao Yu 1 Jiaming Song 1 Stefano Ermon 1 1Department of Computer Science, Stanford University, Stanford, CA 94305 USA.
Pseudocode	Yes	Algorithm 1 Multi-Agent Adversarial IRL
Open Source Code	Yes	The codebase for this work can be found at https://github.com/ermongroup/MA-AIRL.
Open Datasets	Yes	Task Description To answer these questions, we evaluate our MA-AIRL algorithm on a series of simulated particle environments (Lowe et al., 2017).
Dataset Splits	No	No explicit train/validation/test dataset splits were specified with percentages or counts for a static dataset. The paper states: "we use 200 episodes of expert demonstrations, each with 50 time steps, which is close to the amount of time steps used in (Ho & Ermon, 2016)1."
Hardware Specification	No	No specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running experiments were provided.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9, CUDA 11.1) were provided. The paper mentions using "a multi-agent version of ACKTR (Wu et al., 2017; Song et al., 2018)" but without specific version information.
Experiment Setup	No	Specific experimental setup details such as hyperparameter values (learning rates, batch sizes, optimizer settings, etc.) were not provided. The paper states: "we use 200 episodes of expert demonstrations, each with 50 time steps" and "we use behavior cloning to pretrain MA-AIRL and MA-GAIL."