reproducibilityindex.ai

Offline Multi-Agent Reinforcement Learning with Implicit Global-to-Local Value Regularization

Authors: Xiangsen Wang, Haoran Xu, Yinan Zheng, Xianyuan Zhan

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Based on comprehensive experiments on the offline multi-agent Mu Jo Co and Star Craft II micro-management tasks, we show that OMIGA achieves superior performance over the state-of-the-art offline MARL methods in almost all tasks. We evaluate our method using various types of offline datasets on both multi-agent Mu Jo Co [15] and Star Craft Multi-Agent Challenge (SMAC) tasks [16]. Under all settings, OMIGA achieves better performance and enjoys faster convergence compared with other strong baselines.
Researcher Affiliation	Academia	Xiangsen Wang1 Haoran Xu2 Yinan Zheng3 Xianyuan Zhan3,4 1 Beijing Jiaotong University 2 UT Austin 3 Tsinghua University 4 Shanghai Artificial Intelligence Laboratory
Pseudocode	Yes	Algorithm 1 Pseudocode of OMIGA
Open Source Code	Yes	Our code is available at https://github.com/Zheng Yinan-AIR/OMIGA.
Open Datasets	Yes	We choose multi-agent Mu Jo Co [15] and Star Craft Multi-Agent Challenge (SMAC) [16] as our experiment environments. The offline dataset we used is provided by Meng et al. [38], which is collected from the online trained MAPPO agents [47], and is the largest open offline dataset on SMAC.
Dataset Splits	No	The paper describes the collection of different quality offline datasets (expert, medium, medium-replay, medium-expert) but does not specify how these datasets are partitioned into explicit training, validation, and test splits for the experiments.
Hardware Specification	Yes	In this paper, all experiments are implemented with Pytorch and executed on NVIDIA V100 GPUs.
Software Dependencies	No	The paper mentions that experiments are "implemented with Pytorch", but it does not specify the version number of PyTorch or any other software dependencies with their versions.
Experiment Setup	Yes	The local Q-value, state-value networks and policy networks of OMIGA are represented by 3-layer Re LU activated MLPs with 256 units for each hidden layer. For the weight network, we use 2-layer Re LU-activated MLPs with 64 units for each hidden layer. All the networks are optimized by Adam optimizer. Hyperparameter Value Shared parameters Q-value network learning rate 5e-4 Policy network learning rate 5e-4 Optimizer Adam Target update rate 0.005 Batch size 128 Discount factor 0.99 Hidden dimension 256 Weight network hidden dimension 64 OMIGA State-value network learning rate 5e-4 Regularization parameter α 1 or 10