reproducibilityindex.ai

Learning from Good Trajectories in Offline Multi-Agent Reinforcement Learning

Authors: Qi Tian, Kun Kuang, Furui Liu, Baoxiang Wang

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method in both discrete control (i.e., Star Craft II and multi-agent particle environment) and continuous control (i.e., multi-agent mujoco). The results indicate that our method achieves signiﬁcantly better results in complex and mixed ofﬂine multiagent datasets, especially when the difference of data quality between individual trajectories is large.
Researcher Affiliation	Collaboration	1 College of Computer Science and Technology, Zhejiang University, Hangzhou, China 2 Huawei Noah s Ark Lab, Beijing, China 3 School of Data Science, Chinese University of Hong Kong (Shenzhen), Shenzhen, China {tianqics,kunkuang}@zju.edu.cn, liufurui2@huawei.com, bxiangwang@cuhk.edu.cn
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statements about releasing source code or links to code repositories for the methodology described.
Open Datasets	Yes	We construct the agent-wise imbalanced multi-agent datasets based on six maps in Star Craft II (Samvelyan et al. 2019), multi-agent particle environment (MPE) (Lowe et al. 2017) and multi-agent mujoco (MAmujoco) (Peng et al. 2021).
Dataset Splits	No	The paper describes the generation of 'low-quality' and 'medium-quality' datasets and mentions using '5 random seeds' for evaluation, but it does not specify explicit train/validation/test dataset splits or their percentages/counts.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU models, CPU models, or memory specifications used for running the experiments.
Software Dependencies	No	The paper mentions various algorithms and environments (e.g., QMIX, Fac MAC, Star Craft II), but it does not specify any software names with version numbers for dependencies like programming languages, deep learning frameworks, or specific libraries.
Experiment Setup	No	The paper mentions using 'ﬁnd-tuned hyperparameters provided by the authors of BCQ and CQL' but does not specify any concrete hyperparameter values, training configurations, or system-level settings for its own method.