reproducibilityindex.ai

Mimicking To Dominate: Imitation Learning Strategies for Success in Multiagent Games

Authors: The Viet Bui, Tien Mai, Thanh Nguyen

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments in three challenging game environments, including an advanced version of the Star-Craft multi-agent challenge (i.e., SMACv2). Experimental results show that our approach achieves superior performance compared to state-of-the-art MARL algorithms.
Researcher Affiliation	Academia	The Viet Bui Singapore Management University, Singapore theviet.bui.2023@phdcs.smu.edu.sg Tien Mai Singapore Management University, Singapore atmai@smu.edu.sg Thanh Hong Nguyen University of Oregon Eugene, Oregon, United States thanhhng@cs.uoregon.edu
Pseudocode	Yes	Algorithm 1 IMAX-PPO Algorithm Input: Initial allies policy network Πα θ , initial allies value network V α θv, initial imitator s policy network bΠe ψπ, initial imitator s Q network Qe ψQ, learning rates κe π, κe Q, κα π, κα V . Output: Trained allies policy Πα θ 1: for t = 0, 1, . . . do 2: # Updating imitator: 3: ψQ,t+1 = ψQ,t + κe Q ψQ[J(ψQ)] # Train Q function using the objective in (6) 4: ψπ,t+1 = ψπ,t κe π ψπESe,next i [V e Π(Se,next i )] # Update policy bΠe ψπ (for continuous domains) 5: # Updating allies policy: 6: θt+1 = θt + κα π θLα(θ) # Update allies actor by maximizing Lα(θ) 7: θv,t+1 = θv,t κα V θvΦα(θv) # Update allies critic by minimizing Φα(θv) 8: end for 9: return policy solution for allied agents
Open Source Code	Yes	We also uploaded our source code for re-productivity purposes. ... Our source code is submitted alongside the paper, accompanied by sufficient instructions. We will share the code publicly for re-producibility or benchmarking purposes.
Open Datasets	Yes	Finally, we conduct extensive experiments in several benchmarks ranging from complex to simple ones, including: SMACv2 (an advanced version of the Star-Craft multi-agent challenge) [4], Google research football (GRF) [15], and Gold Miner [7].
Dataset Splits	No	The paper mentions 'evaluation' and '32 different rounds of game playing' but does not provide explicit training, validation, and test dataset splits (e.g., percentages or sample counts).
Hardware Specification	No	All sub-tasks (SMACv2, GRF, Miner) are trained concurrently in a GPU-accelerated HPC (High Performance Computing).
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions, or specific library versions).
Experiment Setup	Yes	For a fair comparison between our proposed algorithm and existing methods, we use the same model architecture and hyperparameters as shown in Tables 2 and 3 respectively. ... Table 2: Hyperparameters