Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Inverse Factorized Soft Q-Learning for Cooperative Multi-agent Imitation Learning
Authors: The Viet Bui, Tien Mai, Thanh Nguyen
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present extensive experiments conducted on some challenging multi-agent game environments, including an advanced version of the Star-Craft multi-agent challenge (SMACv2), which demonstrates the effectiveness of our algorithm. |
| Researcher Affiliation | Academia | The Viet Bui Singapore Management University, Singapore EMAIL Tien Mai Singapore Management University, Singapore EMAIL Thanh Hong Nguyen University of Oregon Eugene, Oregon, United States EMAIL |
| Pseudocode | Yes | B.1 MIFQ Algorithm The detailed steps of our MIFQ algorithm are shown in Algo. 1 below: Algorithm 1: Multi-agent Inverse Factorized Q-Learning |
| Open Source Code | Yes | We also uploaded our source code for re-productivity purposes. Our source code is submitted alongside the paper, accompanied by sufficient instructions. We will share the code publicly for re-producibility or benchmarking purposes. |
| Open Datasets | Yes | Finally, we conduct extensive experiments in three domains: SMACv2 [9], Gold Miner [12], and MPE (Multi Particle Environments) [25]. |
| Dataset Splits | No | The paper refers to using expert trajectories for imitation learning and replay buffers for training, but does not specify explicit train/validation/test dataset splits with percentages or counts for the expert demonstrations. |
| Hardware Specification | Yes | We use four High-Performance Computing (HPC) clusters for training and evaluating all tasks. Specifically, each HPC cluster has a workload with an NVIDIA L40 GPU 48 GB GDDR6, 32 Intel-CPU cores, and 100GB RAM. |
| Software Dependencies | No | The paper provides general hyperparameters in Table 2 but does not list specific software dependencies (e.g., programming languages, libraries, or frameworks) with version numbers. |
| Experiment Setup | Yes | Table 2: Hyper-parameters. Arguments MPEs Miner SMACv2 Max training steps 100000 1000000 Evaluate times 32 Buffer size 100000 5000 Learning rate 2e-5 5e-4 Batch size 128 Hidden dim 256 Gamma 0.99 Target update frequency 4 Number of random seeds 4 |