reproducibilityindex.ai

UPDeT: Universal Multi-agent RL via Policy Decoupling with Transformers

Authors: Siyi Hu, Fengda Zhu, Xiaojun Chang, Xiaodan Liang

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on large-scale SMAC multi-agent competitive games demonstrate that the proposed UPDe T-based multi-agent reinforcement learning achieves signiﬁcant improvements relative to state-of-the-art approaches, demonstrating advantageous transfer capability in terms of both performance and training speed (10 times faster).
Researcher Affiliation	Collaboration	Siyi Hu1, Fengda Zhu1, Xiaojun Chang1 , Xiaodan Liang2,3 1Monash University, 2Sun Yat-sen University, 3Dark Matter AI Inc.
Pseudocode	No	The paper describes the model mathematically and textually, but does not include any pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/hhhusiyi-monash/UPDe T
Open Datasets	Yes	In the single scenario experiments, we evaluate the model performance on different scenarios from SMAC (Samvelyan et al. (2019))... We also test the model performance in the MAgent Environment (Zheng et al. (2017)).
Dataset Splits	No	The paper mentions evaluating model performance on different scenarios but does not provide specific training, validation, and test dataset splits or percentages.
Hardware Specification	No	The paper does not specify the exact hardware (e.g., CPU, GPU models, or memory) used for running the experiments.
Software Dependencies	No	The paper lists training hyperparameters but does not provide specific software dependencies with version numbers (e.g., Python, PyTorch/TensorFlow versions, or other libraries).
Experiment Setup	Yes	The transformer block in all different experiments consists of 3 heads and 2 layer transformer blocks. The other important training hyper parameters are as follows: batch size 32 test interval 2000 gamma 0.99 buffer size 5000 token dimension (UPDe T) 32 channel dimension (UPDe T) 32 epsilon start 1.0 epsilon end 0.05 rnn hidden dimension 64 target net update interval 200 mixing embeddding dimension (QMIX) 32 hypernet layers (QMIX) 2 hypernet embedding (QMIX) 64 mixing embeddding dimension (QTRAN) 32 opt loss (QTRAN) 1 nopt min loss (QTRAN) 0.1