Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Pessimism Meets Invariance: Provably Efficient Offline Mean-Field Multi-Agent RL

Authors: Minshuo Chen, Yan Li, Ethan Wang, Zhuoran Yang, Zhaoran Wang, Tuo Zhao

NeurIPS 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments are provided.We perform experiments on the multi-agent particle environment (MPE, Lowe et al. (2017)), a popular benchmark used in prior work (Mordatch and Abbeel, 2018; Liu et al., 2020a).
Researcher Affiliation Academia Minshuo Chen1 Yan Li1 Ethan Wang1 Zhuoran Yang2 Zhaoran Wang3 Tuo Zhao1 1Georgia Tech 2University of California, Berkeley 3Northwestern University
Pseudocode Yes Algorithm 1 Pessimistic Mean-Field Value Iteration (SAFARI)
Open Source Code No Sample code is also available at (followed by an empty link in the PDF). Extension to online setting is provided in a longer technical report version, which is available upon request.
Open Datasets Yes We perform experiments on the multi-agent particle environment (MPE, Lowe et al. (2017)), a popular benchmark used in prior work (Mordatch and Abbeel, 2018; Liu et al., 2020a).
Dataset Splits No No explicit statement of train/validation/test dataset splits was found. The paper mentions using โ€œn = 500 sample episodes of experience dataโ€ for training, but does not detail how this data is partitioned for validation purposes or if thereโ€™s a specific validation set.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory specifications) used for experiments were provided.
Software Dependencies No No specific software dependencies with version numbers (e.g., library versions, programming language versions) were provided.
Experiment Setup Yes Both the policy and critic networks are implemented as traditional MLPs, with 64 and 512 nodes in a single hidden layer, respectively, and we use parameter sharing for policy networks.