Multi-Agent Adversarial Inverse Reinforcement Learning
Authors: Lantao Yu, Jiaming Song, Stefano Ermon
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5. Experiments We seek to answer the following questions via empirical evaluation: (1) Can MA-AIRL efficiently recover the expert policies for each individual agent from the expert demonstrations (policy imitation)? (2) Can MA-AIRL effectively recover the underlying reward functions, for which the expert policies form a LSBRE (reward recovery)? ... The results for cooperative and competitive environments are shown in Tables 1 and 2 respectively. |
| Researcher Affiliation | Academia | Lantao Yu 1 Jiaming Song 1 Stefano Ermon 1 1Department of Computer Science, Stanford University, Stanford, CA 94305 USA. |
| Pseudocode | Yes | Algorithm 1 Multi-Agent Adversarial IRL |
| Open Source Code | Yes | The codebase for this work can be found at https://github.com/ermongroup/MA-AIRL. |
| Open Datasets | Yes | Task Description To answer these questions, we evaluate our MA-AIRL algorithm on a series of simulated particle environments (Lowe et al., 2017). |
| Dataset Splits | No | No explicit train/validation/test dataset splits were specified with percentages or counts for a static dataset. The paper states: "we use 200 episodes of expert demonstrations, each with 50 time steps, which is close to the amount of time steps used in (Ho & Ermon, 2016)1." |
| Hardware Specification | No | No specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running experiments were provided. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9, CUDA 11.1) were provided. The paper mentions using "a multi-agent version of ACKTR (Wu et al., 2017; Song et al., 2018)" but without specific version information. |
| Experiment Setup | No | Specific experimental setup details such as hyperparameter values (learning rates, batch sizes, optimizer settings, etc.) were not provided. The paper states: "we use 200 episodes of expert demonstrations, each with 50 time steps" and "we use behavior cloning to pretrain MA-AIRL and MA-GAIL." |