UPDeT: Universal Multi-agent RL via Policy Decoupling with Transformers
Authors: Siyi Hu, Fengda Zhu, Xiaojun Chang, Xiaodan Liang
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on large-scale SMAC multi-agent competitive games demonstrate that the proposed UPDe T-based multi-agent reinforcement learning achieves significant improvements relative to state-of-the-art approaches, demonstrating advantageous transfer capability in terms of both performance and training speed (10 times faster). |
| Researcher Affiliation | Collaboration | Siyi Hu1, Fengda Zhu1, Xiaojun Chang1 , Xiaodan Liang2,3 1Monash University, 2Sun Yat-sen University, 3Dark Matter AI Inc. |
| Pseudocode | No | The paper describes the model mathematically and textually, but does not include any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/hhhusiyi-monash/UPDe T |
| Open Datasets | Yes | In the single scenario experiments, we evaluate the model performance on different scenarios from SMAC (Samvelyan et al. (2019))... We also test the model performance in the MAgent Environment (Zheng et al. (2017)). |
| Dataset Splits | No | The paper mentions evaluating model performance on different scenarios but does not provide specific training, validation, and test dataset splits or percentages. |
| Hardware Specification | No | The paper does not specify the exact hardware (e.g., CPU, GPU models, or memory) used for running the experiments. |
| Software Dependencies | No | The paper lists training hyperparameters but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch/TensorFlow versions, or other libraries). |
| Experiment Setup | Yes | The transformer block in all different experiments consists of 3 heads and 2 layer transformer blocks. The other important training hyper parameters are as follows: batch size 32 test interval 2000 gamma 0.99 buffer size 5000 token dimension (UPDe T) 32 channel dimension (UPDe T) 32 epsilon start 1.0 epsilon end 0.05 rnn hidden dimension 64 target net update interval 200 mixing embeddding dimension (QMIX) 32 hypernet layers (QMIX) 2 hypernet embedding (QMIX) 64 mixing embeddding dimension (QTRAN) 32 opt loss (QTRAN) 1 nopt min loss (QTRAN) 0.1 |