Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Pessimism Meets Invariance: Provably Efficient Offline Mean-Field Multi-Agent RL
Authors: Minshuo Chen, Yan Li, Ethan Wang, Zhuoran Yang, Zhaoran Wang, Tuo Zhao
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments are provided.We perform experiments on the multi-agent particle environment (MPE, Lowe et al. (2017)), a popular benchmark used in prior work (Mordatch and Abbeel, 2018; Liu et al., 2020a). |
| Researcher Affiliation | Academia | Minshuo Chen1 Yan Li1 Ethan Wang1 Zhuoran Yang2 Zhaoran Wang3 Tuo Zhao1 1Georgia Tech 2University of California, Berkeley 3Northwestern University |
| Pseudocode | Yes | Algorithm 1 Pessimistic Mean-Field Value Iteration (SAFARI) |
| Open Source Code | No | Sample code is also available at (followed by an empty link in the PDF). Extension to online setting is provided in a longer technical report version, which is available upon request. |
| Open Datasets | Yes | We perform experiments on the multi-agent particle environment (MPE, Lowe et al. (2017)), a popular benchmark used in prior work (Mordatch and Abbeel, 2018; Liu et al., 2020a). |
| Dataset Splits | No | No explicit statement of train/validation/test dataset splits was found. The paper mentions using โn = 500 sample episodes of experience dataโ for training, but does not detail how this data is partitioned for validation purposes or if thereโs a specific validation set. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory specifications) used for experiments were provided. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., library versions, programming language versions) were provided. |
| Experiment Setup | Yes | Both the policy and critic networks are implemented as traditional MLPs, with 64 and 512 nodes in a single hidden layer, respectively, and we use parameter sharing for policy networks. |