Learning from Good Trajectories in Offline Multi-Agent Reinforcement Learning

Authors: Qi Tian, Kun Kuang, Furui Liu, Baoxiang Wang

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method in both discrete control (i.e., Star Craft II and multi-agent particle environment) and continuous control (i.e., multi-agent mujoco). The results indicate that our method achieves significantly better results in complex and mixed offline multiagent datasets, especially when the difference of data quality between individual trajectories is large.
Researcher Affiliation Collaboration 1 College of Computer Science and Technology, Zhejiang University, Hangzhou, China 2 Huawei Noah s Ark Lab, Beijing, China 3 School of Data Science, Chinese University of Hong Kong (Shenzhen), Shenzhen, China {tianqics,kunkuang}@zju.edu.cn, liufurui2@huawei.com, bxiangwang@cuhk.edu.cn
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to code repositories for the methodology described.
Open Datasets Yes We construct the agent-wise imbalanced multi-agent datasets based on six maps in Star Craft II (Samvelyan et al. 2019), multi-agent particle environment (MPE) (Lowe et al. 2017) and multi-agent mujoco (MAmujoco) (Peng et al. 2021).
Dataset Splits No The paper describes the generation of 'low-quality' and 'medium-quality' datasets and mentions using '5 random seeds' for evaluation, but it does not specify explicit train/validation/test dataset splits or their percentages/counts.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU models, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions various algorithms and environments (e.g., QMIX, Fac MAC, Star Craft II), but it does not specify any software names with version numbers for dependencies like programming languages, deep learning frameworks, or specific libraries.
Experiment Setup No The paper mentions using 'find-tuned hyperparameters provided by the authors of BCQ and CQL' but does not specify any concrete hyperparameter values, training configurations, or system-level settings for its own method.