Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Encouraging metric-aware diversity in contrastive representation space

Authors: Tianxu Li, Kun Zhu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluations across a variety of challenging multi-agent tasks demonstrate that WCD outperforms existing state-of-the-art methods, delivering superior performance and enhanced exploration. In this section, we use challenging multi-agent tasks from Pac-Men, SMAC, and SMACv2 to demonstrate the outperformance of our method. We show comparison of our method against the state-of-the-art methods such as value-decomposition methods (QMIX [Rashid et al., 2018] and QTRAN [Son et al., 2019]), role-based diversity methods (RODE [Wang et al., 2020c]), mutual information-based diversity methods (MAVEN [Mahajan et al., 2019], EOI [Jiang and Lu, 2021], SCDS [Li et al., 2021], PMIC [Li et al., 2022], LIPO [Charakorn et al., 2023], and Fo X [Jo et al., 2024]), and Wasserstein distance-based diversity methods (MAPD [Hu et al., 2024] and Di Co [Bettini et al., 2024]). Without loss of generality, the comparison results are shown with both the mean and standard deviation of the performance tested across five random seeds. For a fair comparison, we adopt the same hyperparameters and policy network architecture across all methods. More training details and hyperparameters are provided in Appendix K.
Researcher Affiliation Academia Tianxu Li Kun Zhu College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, China EMAIL
Pseudocode Yes C Pseudocode for WCD The pseudocode for WCD is given in Algorithm 1. Algorithm 1 Wasserstein Contrastive Diversity (WCD)
Open Source Code Yes Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We provide the source code in the supplemental material.
Open Datasets Yes We first test our method in Pac-Men, as illustrated in Figure 2a, to investigate the effectiveness of our method in encouraging multi-agent diversity. We then test our method on the Star Craft Multi-Agent Challenge (SMAC) [Samvelyan et al., 2019], a commonly used benchmark for evaluating cooperative MARL algorithms, consisting of various combat scenarios with different difficulties. We further adopt the SMACv2 benchmark Ellis et al. [2022].
Dataset Splits Yes We set the evaluation interval to 10K steps followed by 32 test episodes. We run all methods for 5 million steps in all tested tasks. We employ hard updates to update target networks every 200 episodes in SMAC and SMACv2. In Pac-Men, we utilize soft updates for updating target networks with a momentum of 0.01.
Hardware Specification Yes All experiments are performed on a NVIDIA Ge Force RTX 4090 GPU.
Software Dependencies Yes The SC2.4.10 version of Star Craft II is utilized, and performance comparison across different versions are not applicable. We implement our method using Num Py and Py Torch.
Experiment Setup Yes K Training Details and Hyperparameters In this section, we provide the training details and hyperparameters adopted in our experiments. To implement the one-step prediction method, we use a two-layer MLP with a hidden size of 64 for the encoder gθe followed by the batch normalization and a GRU unit for the autoregressive model gθg. We adopt a dual vector with a dimension m of 64 to parameterize the dual function. To integrate our method with QMIX, the intrinsic agent utility network is implemented with a two-layer MLP with a hidden size of 64. We keep other components the same as in QMIX. The policy networks of all agents are implemented with Deep Recurrent Q-Networks. Table 10: Hyperparameters Pac-Men SMAC SMACv2 hidden dimension 64 128 learning rate 0.0003 0.005 optimizer Adam target update 0.01(soft) 200(hard) batch size 32 64 β 0.03 0.05 α for WCD+QMIX 0.01 0.005 for 3s5z, 2c_vs_64zg, 8m, 5m_vs_6m, 8m_vs_9m, and 10m_vs_11m, 0.05 for 7sz, 6h_vs_8z, corridor, and 3s5z_vs_3s6z 0.03 α for WCD+MAPPO 0.01 0.005 for 3s5z, 2c_vs_64zg, 8m, 5m_vs_6m, 8m_vs_9m, and 10m_vs_11m, 0.03 for 7sz, 6h_vs_8z, corridor, and 3s5z_vs_3s6z 0.03 epsilon anneal time 200,000 200,000 for 3s5z, 2c_vs_64zg, 8m, 5m_vs_6m, 8m_vs_9m, and 10m_vs_11m, 500,000 for 7sz, 6h_vs_8z, corridor, and 3s5z_vs_3s6z 500,000