Individual Reward Assisted Multi-Agent Reinforcement Learning

Authors: Li Wang, Yupeng Zhang, Yujing Hu, Weixun Wang, Chongjie Zhang, Yang Gao, Jianye Hao, Tangjie Lv, Changjie Fan

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results in different scenarios, such as the Multi-Agent Particle Environment and the Google Research Football Environment, show that IRAT significantly outperforms the baseline methods and can greatly promote team policy learning without deviating from the original team objective, even when the individual rewards are misleading or conflict with the team rewards.
Researcher Affiliation Collaboration 1National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China 2Net Ease Fuxi AI Lab, Hangzhou, China 3College of Intelligence and Computing, Tianjin University, Tianjin, China 4Noah s Ark Lab, Beijing, China 5Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China.
Pseudocode Yes The pseudo-code of the IRAT algorithm is shown in Algorithm 1.
Open Source Code Yes Code is available at https://github.com/MDrW/ICML2022IRAT.
Open Datasets No The paper describes using well-known environments (Multi-Agent Particle Environment and Google Research Football) that generate data dynamically, but it does not provide concrete access information (specific link, DOI, repository name, formal citation with authors/year, or reference to established benchmark datasets) for a publicly available or open static dataset used for training.
Dataset Splits No The paper describes training and evaluation procedures within dynamic environments (e.g., 'evaluated every 500,000 steps' in Google Research Football), but it does not specify explicit training/validation/test dataset splits with percentages, sample counts, or citations to predefined static data partitions needed for reproduction.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies No The paper mentions the use of Adam optimizer and ReLU activation, but it does not provide specific version numbers for key software components or libraries such as Python, PyTorch, or TensorFlow, which are necessary for full reproducibility.
Experiment Setup Yes The common parameters of all algorithms in different scenarios are shown in Table B.1, and the parameters of IRAT are shown in Table B.2. ... The hyperparameters chosen for the tested methods in the football experiment are listed in Table B.5.