reproducibilityindex.ai

LAGMA: LAtent Goal-guided Multi-Agent Reinforcement Learning

Authors: Hyungho Na, Il-Chul Moon

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The proposed method is evaluated by Star Craft II with both dense and sparse reward settings and Google Research Football. Empirical results show further performance improvement over state-of-the-art baselines.
Researcher Affiliation	Collaboration	1Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea. 2summary.ai, Daejeon, Republic of Korea.
Pseudocode	Yes	Algorithm 1 Goal-reaching Trajectory and Intrinsic Reward Generation. Algorithm 2 Update Sequence Buffer Dseq. Algorithm 3 Compute J (t). Algorithm 4 Training algorithm for VQ-VAE and DV Q. Algorithm 5 Training algorithm for LAGMA.
Open Source Code	Yes	Our code is available at: https://github.com/aailabkaist/LAGMA.
Open Datasets	Yes	We consider complex multi-agent tasks such as SMAC (Samvelyan et al., 2019) and GRF (Kurach et al., 2020) as benchmark problems.
Dataset Splits	No	The paper mentions training and testing on environments like SMAC and GRF but does not explicitly state training/validation/test dataset splits using specific percentages or counts for a validation set.
Hardware Specification	Yes	Ge Force RTX3090 is used for 5m vs 6m and Ge Force RTX4090 for 8m(sparse) and MMM2.
Software Dependencies	No	The paper describes the algorithms and frameworks used (e.g., QMIX), but it does not explicitly list specific software dependencies with their version numbers (e.g., Python 3.x, PyTorch 1.x, or specific library versions).
Experiment Setup	Yes	Table 3: Hyperparameter settings for experiments. For an update interval nfreq in Algorithm 1, we use the same value nfreq = 5 for all experiments. ϵT represents annealing time for exploration rate of ϵ-greedy, from 1.0 to 0.05. After some parametric studies, adjusting hyperparameter for VQ-VAE training such as ncd freq and nvq freq, instead of varying λ values listed as λvq, λcommit, and λcvr, provides more efficient way of searching parametric space.