Individual Reward Assisted Multi-Agent Reinforcement Learning
Authors: Li Wang, Yupeng Zhang, Yujing Hu, Weixun Wang, Chongjie Zhang, Yang Gao, Jianye Hao, Tangjie Lv, Changjie Fan
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results in different scenarios, such as the Multi-Agent Particle Environment and the Google Research Football Environment, show that IRAT significantly outperforms the baseline methods and can greatly promote team policy learning without deviating from the original team objective, even when the individual rewards are misleading or conflict with the team rewards. |
| Researcher Affiliation | Collaboration | 1National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China 2Net Ease Fuxi AI Lab, Hangzhou, China 3College of Intelligence and Computing, Tianjin University, Tianjin, China 4Noah s Ark Lab, Beijing, China 5Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China. |
| Pseudocode | Yes | The pseudo-code of the IRAT algorithm is shown in Algorithm 1. |
| Open Source Code | Yes | Code is available at https://github.com/MDrW/ICML2022IRAT. |
| Open Datasets | No | The paper describes using well-known environments (Multi-Agent Particle Environment and Google Research Football) that generate data dynamically, but it does not provide concrete access information (specific link, DOI, repository name, formal citation with authors/year, or reference to established benchmark datasets) for a publicly available or open static dataset used for training. |
| Dataset Splits | No | The paper describes training and evaluation procedures within dynamic environments (e.g., 'evaluated every 500,000 steps' in Google Research Football), but it does not specify explicit training/validation/test dataset splits with percentages, sample counts, or citations to predefined static data partitions needed for reproduction. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running experiments. |
| Software Dependencies | No | The paper mentions the use of Adam optimizer and ReLU activation, but it does not provide specific version numbers for key software components or libraries such as Python, PyTorch, or TensorFlow, which are necessary for full reproducibility. |
| Experiment Setup | Yes | The common parameters of all algorithms in different scenarios are shown in Table B.1, and the parameters of IRAT are shown in Table B.2. ... The hyperparameters chosen for the tested methods in the football experiment are listed in Table B.5. |