Counterfactual Conservative Q Learning for Offline Multi-agent Reinforcement Learning

Authors: Jianzhun Shao, Yun Qu, Chen Chen, Hongchang Zhang, Xiangyang Ji

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We further conduct experiments on four environments including both discrete and continuous action settings on both existing and our man-made datasets, demonstrating that CFCQL outperforms existing methods on most datasets and even with a remarkable margin on some of them.
Researcher Affiliation Academia Jianzhun Shao , Yun Qu , Chen Chen, Hongchang Zhang, Xiangyang Ji Department of Automation Tsinghua University, Beijing, China {sjz18, qy22, hc-zhang19}@mails.tsinghua.edu.cn cclvr@163.com xyji@tsinghua.edu.cn
Pseudocode Yes Algorithm 1 CFCQL-D and CFCQL-C
Open Source Code Yes Our code and datasets are available at: https://github.com/thu-rllab/CFCQL
Open Datasets Yes With datasets collected by Pan et al. [34] and ourselves, our method outperforms existing methods in most settings and even with a large margin on some of them. and Our code and datasets are available at: https://github.com/thu-rllab/CFCQL
Dataset Splits No The paper describes how datasets were collected (e.g., 'The datasets are made based on the training process or trained model of QMIX[37]') but does not explicitly state train/validation/test splits by percentages, absolute counts, or by referencing predefined standard splits for their experiments.
Hardware Specification Yes We use 2 servers to run all the experiments. Each one has 8*NVIDIA RTX 3090 GPUs, and 2*AMD 7H12 CPUs. Each setting is repeated for 5 seeds.
Software Dependencies No The paper refers to using various open-source implementations (e.g., 'from Lowe et al. [27]', 'from Samvelyan et al. [40]') and general tools like 'Q-learning' or 'TD3', but does not provide specific version numbers for software dependencies or libraries (e.g., 'Python 3.8', 'PyTorch 1.9').
Experiment Setup Yes Please refer to this repository12 for the code, datasets and the hyper-parameters of our method.