Regularized Softmax Deep Multi-Agent Q-Learning
Authors: Ling Pan, Tabish Rashid, Bei Peng, Longbo Huang, Shimon Whiteson
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct a series of experiments in the multi-agent particle tasks [18] to answer: (i) How much can our RES method improve over QMIX? (ii) How does RES-QMIX compare against state-of-the-art methods in performance and value estimates? (iii) How sensitive is RES to important hyperparameters and what is the effect of each component? (iv) Can RES be applied to other algorithms? We also evaluate our method on the challenging SMAC benchmark [30] to demonstrate its scalability. |
| Researcher Affiliation | Academia | Ling Pan1, Tabish Rashid2, Bei Peng3 , Longbo Huang1, Shimon Whiteson2 1Institute for Interdisciplinary Information Sciences, Tsinghua University pl17@mails.tsinghua.edu.cn, longbohuang@tsinghua.edu.cn 2University of Oxford tabish.rashid@cs.ox.ac.uk, shimon.whiteson@cs.ox.ac.uk 3University of Liverpool bei.peng@liverpool.ac.uk |
| Pseudocode | Yes | The full algorithm for our approximate softmax is in Appendix B.2. |
| Open Source Code | Yes | The code is publicly available at https://github.com/ling-pan/RES. |
| Open Datasets | Yes | We conduct a series of experiments in the multi-agent particle tasks [18] and evaluate it on a set of challenging Star Craft II micromanagement tasks [30]. |
| Dataset Splits | No | The paper describes using standard multi-agent particle environments [18] and Star Craft II micromanagement tasks [30], which typically have predefined setups, but does not explicitly detail the training/validation/test dataset splits within its text. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU/CPU models) used to run the experiments. |
| Software Dependencies | No | The paper mentions using 'Py MARL [30] implementations and setup' but does not specify particular software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | For RES-QMIX, we fix the inverse temperature β to be 0.05 while the regularization coefficient λ is selected based on a grid search over {1e 2, 5e 2, 1e 1, 5e 1} as investigated in Section 5.1.3. A detailed description of the tasks and implementation details is in Appendix E.1, including Adam optimizer with a learning rate of 5e-4 and 64 batch size. |