reproducibilityindex.ai

Regularized Softmax Deep Multi-Agent Q-Learning

Authors: Ling Pan, Tabish Rashid, Bei Peng, Longbo Huang, Shimon Whiteson

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct a series of experiments in the multi-agent particle tasks [18] to answer: (i) How much can our RES method improve over QMIX? (ii) How does RES-QMIX compare against state-of-the-art methods in performance and value estimates? (iii) How sensitive is RES to important hyperparameters and what is the effect of each component? (iv) Can RES be applied to other algorithms? We also evaluate our method on the challenging SMAC benchmark [30] to demonstrate its scalability.
Researcher Affiliation	Academia	Ling Pan1, Tabish Rashid2, Bei Peng3 , Longbo Huang1, Shimon Whiteson2 1Institute for Interdisciplinary Information Sciences, Tsinghua University pl17@mails.tsinghua.edu.cn, longbohuang@tsinghua.edu.cn 2University of Oxford tabish.rashid@cs.ox.ac.uk, shimon.whiteson@cs.ox.ac.uk 3University of Liverpool bei.peng@liverpool.ac.uk
Pseudocode	Yes	The full algorithm for our approximate softmax is in Appendix B.2.
Open Source Code	Yes	The code is publicly available at https://github.com/ling-pan/RES.
Open Datasets	Yes	We conduct a series of experiments in the multi-agent particle tasks [18] and evaluate it on a set of challenging Star Craft II micromanagement tasks [30].
Dataset Splits	No	The paper describes using standard multi-agent particle environments [18] and Star Craft II micromanagement tasks [30], which typically have predefined setups, but does not explicitly detail the training/validation/test dataset splits within its text.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models) used to run the experiments.
Software Dependencies	No	The paper mentions using 'Py MARL [30] implementations and setup' but does not specify particular software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	For RES-QMIX, we ﬁx the inverse temperature β to be 0.05 while the regularization coefﬁcient λ is selected based on a grid search over {1e 2, 5e 2, 1e 1, 5e 1} as investigated in Section 5.1.3. A detailed description of the tasks and implementation details is in Appendix E.1, including Adam optimizer with a learning rate of 5e-4 and 64 batch size.