Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Individual Reward Assisted Multi-Agent Reinforcement Learning

Authors: Li Wang, Yupeng Zhang, Yujing Hu, Weixun Wang, Chongjie Zhang, Yang Gao, Jianye Hao, Tangjie Lv, Changjie Fan

ICML 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results in different scenarios, such as the Multi-Agent Particle Environment and the Google Research Football Environment, show that IRAT significantly outperforms the baseline methods and can greatly promote team policy learning without deviating from the original team objective, even when the individual rewards are misleading or conflict with the team rewards.
Researcher Affiliation Collaboration 1National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China 2Net Ease Fuxi AI Lab, Hangzhou, China 3College of Intelligence and Computing, Tianjin University, Tianjin, China 4Noah s Ark Lab, Beijing, China 5Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China.
Pseudocode Yes The pseudo-code of the IRAT algorithm is shown in Algorithm 1.
Open Source Code Yes Code is available at https://github.com/MDrW/ICML2022IRAT.
Open Datasets No The paper describes using well-known environments (Multi-Agent Particle Environment and Google Research Football) that generate data dynamically, but it does not provide concrete access information (specific link, DOI, repository name, formal citation with authors/year, or reference to established benchmark datasets) for a publicly available or open static dataset used for training.
Dataset Splits No The paper describes training and evaluation procedures within dynamic environments (e.g., 'evaluated every 500,000 steps' in Google Research Football), but it does not specify explicit training/validation/test dataset splits with percentages, sample counts, or citations to predefined static data partitions needed for reproduction.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies No The paper mentions the use of Adam optimizer and ReLU activation, but it does not provide specific version numbers for key software components or libraries such as Python, PyTorch, or TensorFlow, which are necessary for full reproducibility.
Experiment Setup Yes The common parameters of all algorithms in different scenarios are shown in Table B.1, and the parameters of IRAT are shown in Table B.2. ... The hyperparameters chosen for the tested methods in the football experiment are listed in Table B.5.