Sample-Efficient Multiagent Reinforcement Learning with Reset Replay

Authors: Yaodong Yang, Guangyong Chen, Jianye Hao, Pheng-Ann Heng

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments in SMAC and MPE show that MARR significantly improves the performance of various MARL approaches with much fewer environment interactions.
Researcher Affiliation Collaboration 1Department of CSE, CUHK 2Zhejiang Lab 3Shenzhen Institutes of Advanced Technology, CAS 4Tianjin University 5Noah s Ark Lab, Huawei 6Institute of Medical Intelligence and XR, CUHK.
Pseudocode Yes Algorithm 1 Multiagent Reinforcement Learning with Reset Replay (MARR)
Open Source Code Yes Code is available at Git Hub.
Open Datasets Yes In this section, we validate MARR1 on both the Star Craft Multi-Agent Challenge (SMAC) (Samvelyan et al., 2019) with discrete action space and the Multiagent Particle Environment (MPE) (Lowe et al., 2017) with continuous action space.
Dataset Splits No The paper describes using standard benchmark environments (SMAC and MPE) and evaluation metrics (test win rate, episodic return) over multiple independent runs. However, it does not provide explicit numerical details for training/validation/test dataset splits, as the data is generated through environment interaction rather than being from a fixed, pre-split dataset.
Hardware Specification No The paper mentions running experiments in parallel environments (e.g., 'number of parallel environments to 8'), but it does not specify any particular hardware details such as GPU models, CPU types, or cloud computing instances used for these experiments.
Software Dependencies Yes The SMAC environment is with discrete action space and the used version of Star Craft II is 4.6.2. We implement MARR based on the pymarl framework (Samvelyan et al., 2019).
Experiment Setup Yes For all the tasks, we set α at 0.8 and the reset interval TR at 2000 for Shrink & Perturb, and set a at 0.8 and b at 1.2 for the random amplitude scale.