MFVFD: A Multi-Agent Q-Learning Approach to Cooperative and Non-Cooperative Tasks
Authors: Tianhao Zhang, Qiwei Ye, Jiang Bian, Guangming Xie, Tie-Yan Liu
IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our analysis on the Hawk-Dove and Nonmonotonic Cooperation matrix games evaluate MFVFD s convergent solution. Empirical studies on the challenging mixed cooperative-competitive tasks where hundreds of agents coexist demonstrate that MFVFD significantly outperforms existing baselines. We assessed the performance of MFVFD by comparing it against state-of-art MARL algorithms in four environments. First, we consider different types of single-state matrix games, including the Hawk-Dove non-cooperation matrix game and Nonmonotonic Cooperation Matrix Game. Results show that our proposed approach converges to the pure Nash Equilibrium (NE) in non-cooperation game and successfully finds the Pareto Optimal solution in the cooperative game. We then observed its cooperation ability in the Cooperative Navigation environment and further evaluated its performance in a more challenging Mixed-Cooperation Competition game with 400 agents, MAgent [Zheng et al., 2017]. Empirical results show that MFVFD significantly outperforms other multi-agent baselines. To further understand the efficacy of MFVFD , we evaluated MFVFD on a range of tasks on FLOW, a traffic control benchmark [Wu et al., 2017], and the results show that MFVFD can converge faster than the baseline with a better final performance. |
| Researcher Affiliation | Collaboration | 1Peking University 2Microsoft Research Asia {tianhao z, xiegming}@pku.edu.cn, {qiwye, jiabia, tyliu}@microsoft.com |
| Pseudocode | Yes | Algorithm 1 Mean field value decomposition |
| Open Source Code | No | The paper mentions "See the supplementary material for details and related animations." This is too vague to confirm the release of source code for their method. No specific repository link or explicit statement about code release is provided. |
| Open Datasets | Yes | We chose the Hawk-dove and Nonmonotonic Cooperation matrix games... We then observed its cooperation ability in the Cooperative Navigation environment and further evaluated its performance in a more challenging Mixed-Cooperation Competition game with 400 agents, MAgent [Zheng et al., 2017]. To further understand the efficacy of MFVFD , we evaluated MFVFD on a range of tasks on FLOW, a traffic control benchmark [Wu et al., 2017]. |
| Dataset Splits | No | The paper does not provide specific training/test/validation dataset splits (e.g., percentages or sample counts) for the experiments. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper describes the model architecture ("simple fully connected networks with 2 hidden layers, where each layer has 64 neurons with Re LU activation") but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | The structure of Qi LOC, Qi MF in practice are simple fully connected networks with 2 hidden layers, where each layer has 64 neurons with Re LU activation. To ensure sufficient data collection in the joint action space, we adopted the ϵ greedy for 50k steps. Each algorithm repeats the experiment five times under the same settings. We trained all algorithms through self-play under the same settings. |