Inverse Factorized Soft Q-Learning for Cooperative Multi-agent Imitation Learning

Authors: The Viet Bui, Tien Mai, Thanh Nguyen

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present extensive experiments conducted on some challenging multi-agent game environments, including an advanced version of the Star-Craft multi-agent challenge (SMACv2), which demonstrates the effectiveness of our algorithm.
Researcher Affiliation Academia The Viet Bui Singapore Management University, Singapore theviet.bui.2023@phdcs.smu.edu.sg Tien Mai Singapore Management University, Singapore atmai@smu.edu.sg Thanh Hong Nguyen University of Oregon Eugene, Oregon, United States thanhhng@cs.uoregon.edu
Pseudocode Yes B.1 MIFQ Algorithm The detailed steps of our MIFQ algorithm are shown in Algo. 1 below: Algorithm 1: Multi-agent Inverse Factorized Q-Learning
Open Source Code Yes We also uploaded our source code for re-productivity purposes. Our source code is submitted alongside the paper, accompanied by sufficient instructions. We will share the code publicly for re-producibility or benchmarking purposes.
Open Datasets Yes Finally, we conduct extensive experiments in three domains: SMACv2 [9], Gold Miner [12], and MPE (Multi Particle Environments) [25].
Dataset Splits No The paper refers to using expert trajectories for imitation learning and replay buffers for training, but does not specify explicit train/validation/test dataset splits with percentages or counts for the expert demonstrations.
Hardware Specification Yes We use four High-Performance Computing (HPC) clusters for training and evaluating all tasks. Specifically, each HPC cluster has a workload with an NVIDIA L40 GPU 48 GB GDDR6, 32 Intel-CPU cores, and 100GB RAM.
Software Dependencies No The paper provides general hyperparameters in Table 2 but does not list specific software dependencies (e.g., programming languages, libraries, or frameworks) with version numbers.
Experiment Setup Yes Table 2: Hyper-parameters. Arguments MPEs Miner SMACv2 Max training steps 100000 1000000 Evaluate times 32 Buffer size 100000 5000 Learning rate 2e-5 5e-4 Batch size 128 Hidden dim 256 Gamma 0.99 Target update frequency 4 Number of random seeds 4