Mimicking To Dominate: Imitation Learning Strategies for Success in Multiagent Games

Authors: The Viet Bui, Tien Mai, Thanh Nguyen

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments in three challenging game environments, including an advanced version of the Star-Craft multi-agent challenge (i.e., SMACv2). Experimental results show that our approach achieves superior performance compared to state-of-the-art MARL algorithms.
Researcher Affiliation Academia The Viet Bui Singapore Management University, Singapore theviet.bui.2023@phdcs.smu.edu.sg Tien Mai Singapore Management University, Singapore atmai@smu.edu.sg Thanh Hong Nguyen University of Oregon Eugene, Oregon, United States thanhhng@cs.uoregon.edu
Pseudocode Yes Algorithm 1 IMAX-PPO Algorithm Input: Initial allies policy network Πα θ , initial allies value network V α θv, initial imitator s policy network bΠe ψπ, initial imitator s Q network Qe ψQ, learning rates κe π, κe Q, κα π, κα V . Output: Trained allies policy Πα θ 1: for t = 0, 1, . . . do 2: # Updating imitator: 3: ψQ,t+1 = ψQ,t + κe Q ψQ[J(ψQ)] # Train Q function using the objective in (6) 4: ψπ,t+1 = ψπ,t κe π ψπESe,next i [V e Π(Se,next i )] # Update policy bΠe ψπ (for continuous domains) 5: # Updating allies policy: 6: θt+1 = θt + κα π θLα(θ) # Update allies actor by maximizing Lα(θ) 7: θv,t+1 = θv,t κα V θvΦα(θv) # Update allies critic by minimizing Φα(θv) 8: end for 9: return policy solution for allied agents
Open Source Code Yes We also uploaded our source code for re-productivity purposes. ... Our source code is submitted alongside the paper, accompanied by sufficient instructions. We will share the code publicly for re-producibility or benchmarking purposes.
Open Datasets Yes Finally, we conduct extensive experiments in several benchmarks ranging from complex to simple ones, including: SMACv2 (an advanced version of the Star-Craft multi-agent challenge) [4], Google research football (GRF) [15], and Gold Miner [7].
Dataset Splits No The paper mentions 'evaluation' and '32 different rounds of game playing' but does not provide explicit training, validation, and test dataset splits (e.g., percentages or sample counts).
Hardware Specification No All sub-tasks (SMACv2, GRF, Miner) are trained concurrently in a GPU-accelerated HPC (High Performance Computing).
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions, or specific library versions).
Experiment Setup Yes For a fair comparison between our proposed algorithm and existing methods, we use the same model architecture and hyperparameters as shown in Tables 2 and 3 respectively. ... Table 2: Hyperparameters