Learning Distinguishable Trajectory Representation with Contrastive Loss

Authors: Tianxu Li, Kun Zhu, Juan Li, Yang Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We implement CTR on top of QMIX and evaluate its performance in various cooperative multi-agent tasks. The empirical results demonstrate that our proposed CTR yields significant performance improvement over the state-of-the-art methods.
Researcher Affiliation Academia Tianxu Li1,2 Kun Zhu1,2, Juan Li1 Yang Zhang1 1College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, China 2Collaborative Innovation Center of Novel Software Technology and Industrialization {tianxuli, zhukun, yangzhang, juanli}@nuaa.edu.cn
Pseudocode Yes We refer the reader to Appendix C for the Pytorch-style pseudocode of our proposed CTR. Algorithm 1: Py Torch-style pseudocode for CTR
Open Source Code Yes Our code can be found in the uploaded supplemental material.
Open Datasets Yes We evaluate CTR in Pac Men, SMAC, and SMACv2 benchmarks. The Star Craft Multi-Agent Challenge (SMAC) [Samvelyan et al., 2019] is a common-used benchmark for evaluating cooperative MARL algorithms. SMACv2 [Ellis et al., 2022] that enables stochasticity in SMAC scenarios via introducing random team compositions and random start positions.
Dataset Splits No The paper mentions 'test win rates' and '32 test episodes' but does not explicitly state any train/validation/test dataset splits by percentage or sample count, nor does it explicitly mention a 'validation' set or phase.
Hardware Specification Yes All experiments are performed using NVIDIA Ge Force RTX 4090 GPUs.
Software Dependencies No The paper states 'We implement our method with Num Py and Py Torch.' but does not provide specific version numbers for these software libraries.
Experiment Setup Yes The hyperparameters of CTR and baseline algorithms in Pac-Men, SMAC, and SMACv2 are listed in Table 4. We set the evaluation interval to 10K steps followed by 32 test episodes. We run all methods for 5 million steps. In both SMAC and SMACv2, the target networks are updated via hard updates every 200 episodes. In Pac-Men, the target networks use soft updates at a momentum rate of 0.01.