Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Multi-Agent Adversarial Inverse Reinforcement Learning
Authors: Lantao Yu, Jiaming Song, Stefano Ermon
ICML 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. Experiments We seek to answer the following questions via empirical evaluation: (1) Can MA-AIRL efficiently recover the expert policies for each individual agent from the expert demonstrations (policy imitation)? (2) Can MA-AIRL effectively recover the underlying reward functions, for which the expert policies form a LSBRE (reward recovery)? ... The results for cooperative and competitive environments are shown in Tables 1 and 2 respectively. |
| Researcher Affiliation | Academia | Lantao Yu 1 Jiaming Song 1 Stefano Ermon 1 1Department of Computer Science, Stanford University, Stanford, CA 94305 USA. |
| Pseudocode | Yes | Algorithm 1 Multi-Agent Adversarial IRL |
| Open Source Code | Yes | The codebase for this work can be found at https://github.com/ermongroup/MA-AIRL. |
| Open Datasets | Yes | Task Description To answer these questions, we evaluate our MA-AIRL algorithm on a series of simulated particle environments (Lowe et al., 2017). |
| Dataset Splits | No | No explicit train/validation/test dataset splits were specified with percentages or counts for a static dataset. The paper states: "we use 200 episodes of expert demonstrations, each with 50 time steps, which is close to the amount of time steps used in (Ho & Ermon, 2016)1." |
| Hardware Specification | No | No specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running experiments were provided. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9, CUDA 11.1) were provided. The paper mentions using "a multi-agent version of ACKTR (Wu et al., 2017; Song et al., 2018)" but without specific version information. |
| Experiment Setup | No | Specific experimental setup details such as hyperparameter values (learning rates, batch sizes, optimizer settings, etc.) were not provided. The paper states: "we use 200 episodes of expert demonstrations, each with 50 time steps" and "we use behavior cloning to pretrain MA-AIRL and MA-GAIL." |