Learning Distinguishable Trajectory Representation with Contrastive Loss
Authors: Tianxu Li, Kun Zhu, Juan Li, Yang Zhang
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implement CTR on top of QMIX and evaluate its performance in various cooperative multi-agent tasks. The empirical results demonstrate that our proposed CTR yields significant performance improvement over the state-of-the-art methods. |
| Researcher Affiliation | Academia | Tianxu Li1,2 Kun Zhu1,2, Juan Li1 Yang Zhang1 1College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, China 2Collaborative Innovation Center of Novel Software Technology and Industrialization {tianxuli, zhukun, yangzhang, juanli}@nuaa.edu.cn |
| Pseudocode | Yes | We refer the reader to Appendix C for the Pytorch-style pseudocode of our proposed CTR. Algorithm 1: Py Torch-style pseudocode for CTR |
| Open Source Code | Yes | Our code can be found in the uploaded supplemental material. |
| Open Datasets | Yes | We evaluate CTR in Pac Men, SMAC, and SMACv2 benchmarks. The Star Craft Multi-Agent Challenge (SMAC) [Samvelyan et al., 2019] is a common-used benchmark for evaluating cooperative MARL algorithms. SMACv2 [Ellis et al., 2022] that enables stochasticity in SMAC scenarios via introducing random team compositions and random start positions. |
| Dataset Splits | No | The paper mentions 'test win rates' and '32 test episodes' but does not explicitly state any train/validation/test dataset splits by percentage or sample count, nor does it explicitly mention a 'validation' set or phase. |
| Hardware Specification | Yes | All experiments are performed using NVIDIA Ge Force RTX 4090 GPUs. |
| Software Dependencies | No | The paper states 'We implement our method with Num Py and Py Torch.' but does not provide specific version numbers for these software libraries. |
| Experiment Setup | Yes | The hyperparameters of CTR and baseline algorithms in Pac-Men, SMAC, and SMACv2 are listed in Table 4. We set the evaluation interval to 10K steps followed by 32 test episodes. We run all methods for 5 million steps. In both SMAC and SMACv2, the target networks are updated via hard updates every 200 episodes. In Pac-Men, the target networks use soft updates at a momentum rate of 0.01. |