reproducibilityindex.ai

MFVFD: A Multi-Agent Q-Learning Approach to Cooperative and Non-Cooperative Tasks

Authors: Tianhao Zhang, Qiwei Ye, Jiang Bian, Guangming Xie, Tie-Yan Liu

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our analysis on the Hawk-Dove and Nonmonotonic Cooperation matrix games evaluate MFVFD s convergent solution. Empirical studies on the challenging mixed cooperative-competitive tasks where hundreds of agents coexist demonstrate that MFVFD signiﬁcantly outperforms existing baselines. We assessed the performance of MFVFD by comparing it against state-of-art MARL algorithms in four environments. First, we consider different types of single-state matrix games, including the Hawk-Dove non-cooperation matrix game and Nonmonotonic Cooperation Matrix Game. Results show that our proposed approach converges to the pure Nash Equilibrium (NE) in non-cooperation game and successfully ﬁnds the Pareto Optimal solution in the cooperative game. We then observed its cooperation ability in the Cooperative Navigation environment and further evaluated its performance in a more challenging Mixed-Cooperation Competition game with 400 agents, MAgent [Zheng et al., 2017]. Empirical results show that MFVFD signiﬁcantly outperforms other multi-agent baselines. To further understand the efﬁcacy of MFVFD , we evaluated MFVFD on a range of tasks on FLOW, a trafﬁc control benchmark [Wu et al., 2017], and the results show that MFVFD can converge faster than the baseline with a better ﬁnal performance.
Researcher Affiliation	Collaboration	1Peking University 2Microsoft Research Asia {tianhao z, xiegming}@pku.edu.cn, {qiwye, jiabia, tyliu}@microsoft.com
Pseudocode	Yes	Algorithm 1 Mean ﬁeld value decomposition
Open Source Code	No	The paper mentions "See the supplementary material for details and related animations." This is too vague to confirm the release of source code for their method. No specific repository link or explicit statement about code release is provided.
Open Datasets	Yes	We chose the Hawk-dove and Nonmonotonic Cooperation matrix games... We then observed its cooperation ability in the Cooperative Navigation environment and further evaluated its performance in a more challenging Mixed-Cooperation Competition game with 400 agents, MAgent [Zheng et al., 2017]. To further understand the efﬁcacy of MFVFD , we evaluated MFVFD on a range of tasks on FLOW, a trafﬁc control benchmark [Wu et al., 2017].
Dataset Splits	No	The paper does not provide specific training/test/validation dataset splits (e.g., percentages or sample counts) for the experiments.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper describes the model architecture ("simple fully connected networks with 2 hidden layers, where each layer has 64 neurons with Re LU activation") but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	The structure of Qi LOC, Qi MF in practice are simple fully connected networks with 2 hidden layers, where each layer has 64 neurons with Re LU activation. To ensure sufﬁcient data collection in the joint action space, we adopted the ϵ greedy for 50k steps. Each algorithm repeats the experiment ﬁve times under the same settings. We trained all algorithms through self-play under the same settings.