Pessimism Meets Invariance: Provably Efficient Offline Mean-Field Multi-Agent RL

Authors: Minshuo Chen, Yan Li, Ethan Wang, Zhuoran Yang, Zhaoran Wang, Tuo Zhao

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments are provided.We perform experiments on the multi-agent particle environment (MPE, Lowe et al. (2017)), a popular benchmark used in prior work (Mordatch and Abbeel, 2018; Liu et al., 2020a).
Researcher Affiliation Academia Minshuo Chen1 Yan Li1 Ethan Wang1 Zhuoran Yang2 Zhaoran Wang3 Tuo Zhao1 1Georgia Tech 2University of California, Berkeley 3Northwestern University
Pseudocode Yes Algorithm 1 Pessimistic Mean-Field Value Iteration (SAFARI)
Open Source Code No Sample code is also available at (followed by an empty link in the PDF). Extension to online setting is provided in a longer technical report version, which is available upon request.
Open Datasets Yes We perform experiments on the multi-agent particle environment (MPE, Lowe et al. (2017)), a popular benchmark used in prior work (Mordatch and Abbeel, 2018; Liu et al., 2020a).
Dataset Splits No No explicit statement of train/validation/test dataset splits was found. The paper mentions using “n = 500 sample episodes of experience data” for training, but does not detail how this data is partitioned for validation purposes or if there’s a specific validation set.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory specifications) used for experiments were provided.
Software Dependencies No No specific software dependencies with version numbers (e.g., library versions, programming language versions) were provided.
Experiment Setup Yes Both the policy and critic networks are implemented as traditional MLPs, with 64 and 512 nodes in a single hidden layer, respectively, and we use parameter sharing for policy networks.