Mimicking To Dominate: Imitation Learning Strategies for Success in Multiagent Games
Authors: The Viet Bui, Tien Mai, Thanh Nguyen
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments in three challenging game environments, including an advanced version of the Star-Craft multi-agent challenge (i.e., SMACv2). Experimental results show that our approach achieves superior performance compared to state-of-the-art MARL algorithms. |
| Researcher Affiliation | Academia | The Viet Bui Singapore Management University, Singapore theviet.bui.2023@phdcs.smu.edu.sg Tien Mai Singapore Management University, Singapore atmai@smu.edu.sg Thanh Hong Nguyen University of Oregon Eugene, Oregon, United States thanhhng@cs.uoregon.edu |
| Pseudocode | Yes | Algorithm 1 IMAX-PPO Algorithm Input: Initial allies policy network Πα θ , initial allies value network V α θv, initial imitator s policy network bΠe ψπ, initial imitator s Q network Qe ψQ, learning rates κe π, κe Q, κα π, κα V . Output: Trained allies policy Πα θ 1: for t = 0, 1, . . . do 2: # Updating imitator: 3: ψQ,t+1 = ψQ,t + κe Q ψQ[J(ψQ)] # Train Q function using the objective in (6) 4: ψπ,t+1 = ψπ,t κe π ψπESe,next i [V e Π(Se,next i )] # Update policy bΠe ψπ (for continuous domains) 5: # Updating allies policy: 6: θt+1 = θt + κα π θLα(θ) # Update allies actor by maximizing Lα(θ) 7: θv,t+1 = θv,t κα V θvΦα(θv) # Update allies critic by minimizing Φα(θv) 8: end for 9: return policy solution for allied agents |
| Open Source Code | Yes | We also uploaded our source code for re-productivity purposes. ... Our source code is submitted alongside the paper, accompanied by sufficient instructions. We will share the code publicly for re-producibility or benchmarking purposes. |
| Open Datasets | Yes | Finally, we conduct extensive experiments in several benchmarks ranging from complex to simple ones, including: SMACv2 (an advanced version of the Star-Craft multi-agent challenge) [4], Google research football (GRF) [15], and Gold Miner [7]. |
| Dataset Splits | No | The paper mentions 'evaluation' and '32 different rounds of game playing' but does not provide explicit training, validation, and test dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | No | All sub-tasks (SMACv2, GRF, Miner) are trained concurrently in a GPU-accelerated HPC (High Performance Computing). |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions, or specific library versions). |
| Experiment Setup | Yes | For a fair comparison between our proposed algorithm and existing methods, we use the same model architecture and hyperparameters as shown in Tables 2 and 3 respectively. ... Table 2: Hyperparameters |