Efficient Multi-Agent Communication via Shapley Message Value
Authors: Di Xue, Lei Yuan, Zongzhang Zhang, Yang Yu
IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we design experiments to verify that SMS can learn an efficient and targeted communication protocol in multi-agent tasks1. We compare SMS with multiple baselines on cooperative tasks, including Listener-Speaker, Hallway [Wang et al., 2020b], and the challenging Star Craft II unit micromanagement benchmark2 [Samvelyan et al., 2019]. For evaluation, all results are illustrated with median performance with 95% confidence interval on 5 random seeds. |
| Researcher Affiliation | Collaboration | Di Xue1, , Lei Yuan1,2, , Zongzhang Zhang1, and Yang Yu1,3 1State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing 210023, China 2Polixir Technologies, Nanjing 210000, China 3Peng Cheng Laboratory, Shenzhen, Guangdong, China |
| Pseudocode | Yes | Interested readers may refer to our pseudo-code in Appendix C. |
| Open Source Code | Yes | Code available at https://github.com/Di Xue98/SMS. |
| Open Datasets | Yes | Star Craft Multi-Agent Challenge (SMAC) [Samvelyan et al., 2019]. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning. It mentions 'median performance with 95% confidence interval on 5 random seeds' but not explicit splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers like Python 3.8, CPLEX 12.4) needed to replicate the experiment. It only mentions Py MARL2 and the SC2 game version. |
| Experiment Setup | Yes | For SMS we set tselector = 0.3M. ... The ablation of SMS, i.e., Full Comm, behaves almost the same as our method before 0.4M but deteriorates after that. We think it is because its policy overfits to the redundant messages of irrelevant speakers, converging to sub-optimal policy. Figure 2(b) also reveals the effects of different sample sizes H (2, 4, 6) used to approximate the SMV on the performance. As the sample size grows, we have a little faster convergence speed, indicating a more accurate SMV estimation. A large sample size also costs more computational resources. We finally set the sample size H = 2 in the following experiments for a computational complexity and performance trade-off. ... To deal with discrete action space, we apply the Gumbel-Softmax trick [Jang et al., 2016] to reparameterize the stochastic policies as deterministic functions. In order to calculate SMV, we have to evaluate the value of messages from the message coalition by Qi(s, a C i ), where C Ci. Since the policy is modeled by a neural network, extra care should be taken to avoid the possible correlation between messages so that each message can be useful on its own. To this end, we train the policy by dropping each message with probability pdrop which is a hyper-parameter, just like the dropout technique [Srivastava et al., 2014]. ... Detailed network architecture and hyper-parameter choices are shown in Appendix B. |