Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Multi-Agent Generative Adversarial Imitation Learning
Authors: Jiaming Song, Hongyu Ren, Dorsa Sadigh, Stefano Ermon
NeurIPS 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Experiments We evaluate the performance of (centralized, decentralized, and zero-sum versions) of MAGAIL under two types of environments. One is a particle environment which allows for complex interactions and behaviors; the other is a control task, where multiple agents try to cooperate and move a plank forward. We collect results by averaging over 5 random seeds. Our implementation is based on Open AI baselines [33]; please refer to Appendix C for implementation details3. |
| Researcher Affiliation | Academia | Jiaming Song Stanford University EMAIL Hongyu Ren Stanford University EMAIL Dorsa Sadigh Stanford University EMAIL Stefano Ermon Stanford University EMAIL |
| Pseudocode | Yes | We outline the algorithm Multi-Agent GAIL (MAGAIL) in Appendix B. |
| Open Source Code | Yes | 3Code for reproducing the experiments are in https://github.com/ermongroup/multiagent-gail. |
| Open Datasets | Yes | We first consider the particle environment proposed in [14], which consists of several agents and landmarks. |
| Dataset Splits | No | The paper mentions using "100 to 400 episodes of expert demonstrations, each with 50 timesteps" but does not provide specific train/validation/test dataset splits or cross-validation details for these demonstrations. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware used for running the experiments, such as GPU models, CPU models, or cloud computing specifications. |
| Software Dependencies | No | Our implementation is based on Open AI baselines [33]; please refer to Appendix C for implementation details3." (The reference [33] is 'Openai baselines. https://github.com/openai/baselines, 2017.') This mentions a software library but does not provide specific version numbers for it or other key dependencies. |
| Experiment Setup | Yes | We collect results by averaging over 5 random seeds." and "We consider 100 to 400 episodes of expert demonstrations, each with 50 timesteps, which is close to the amount of timesteps used for the control tasks in [16]." and "Following [34], we pretrain our Multi-Agent GAIL methods and the GAIL baseline using behavior cloning as initialization to reduce sample complexity for exploration. |