Multi-Agent Adversarial Inverse Reinforcement Learning

Authors: Lantao Yu, Jiaming Song, Stefano Ermon

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5. Experiments We seek to answer the following questions via empirical evaluation: (1) Can MA-AIRL efficiently recover the expert policies for each individual agent from the expert demonstrations (policy imitation)? (2) Can MA-AIRL effectively recover the underlying reward functions, for which the expert policies form a LSBRE (reward recovery)? ... The results for cooperative and competitive environments are shown in Tables 1 and 2 respectively.
Researcher Affiliation Academia Lantao Yu 1 Jiaming Song 1 Stefano Ermon 1 1Department of Computer Science, Stanford University, Stanford, CA 94305 USA.
Pseudocode Yes Algorithm 1 Multi-Agent Adversarial IRL
Open Source Code Yes The codebase for this work can be found at https://github.com/ermongroup/MA-AIRL.
Open Datasets Yes Task Description To answer these questions, we evaluate our MA-AIRL algorithm on a series of simulated particle environments (Lowe et al., 2017).
Dataset Splits No No explicit train/validation/test dataset splits were specified with percentages or counts for a static dataset. The paper states: "we use 200 episodes of expert demonstrations, each with 50 time steps, which is close to the amount of time steps used in (Ho & Ermon, 2016)1."
Hardware Specification No No specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running experiments were provided.
Software Dependencies No No specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9, CUDA 11.1) were provided. The paper mentions using "a multi-agent version of ACKTR (Wu et al., 2017; Song et al., 2018)" but without specific version information.
Experiment Setup No Specific experimental setup details such as hyperparameter values (learning rates, batch sizes, optimizer settings, etc.) were not provided. The paper states: "we use 200 episodes of expert demonstrations, each with 50 time steps" and "we use behavior cloning to pretrain MA-AIRL and MA-GAIL."