LAGMA: LAtent Goal-guided Multi-Agent Reinforcement Learning

Authors: Hyungho Na, Il-Chul Moon

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The proposed method is evaluated by Star Craft II with both dense and sparse reward settings and Google Research Football. Empirical results show further performance improvement over state-of-the-art baselines.
Researcher Affiliation Collaboration 1Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea. 2summary.ai, Daejeon, Republic of Korea.
Pseudocode Yes Algorithm 1 Goal-reaching Trajectory and Intrinsic Reward Generation. Algorithm 2 Update Sequence Buffer Dseq. Algorithm 3 Compute J (t). Algorithm 4 Training algorithm for VQ-VAE and DV Q. Algorithm 5 Training algorithm for LAGMA.
Open Source Code Yes Our code is available at: https://github.com/aailabkaist/LAGMA.
Open Datasets Yes We consider complex multi-agent tasks such as SMAC (Samvelyan et al., 2019) and GRF (Kurach et al., 2020) as benchmark problems.
Dataset Splits No The paper mentions training and testing on environments like SMAC and GRF but does not explicitly state training/validation/test dataset splits using specific percentages or counts for a validation set.
Hardware Specification Yes Ge Force RTX3090 is used for 5m vs 6m and Ge Force RTX4090 for 8m(sparse) and MMM2.
Software Dependencies No The paper describes the algorithms and frameworks used (e.g., QMIX), but it does not explicitly list specific software dependencies with their version numbers (e.g., Python 3.x, PyTorch 1.x, or specific library versions).
Experiment Setup Yes Table 3: Hyperparameter settings for experiments. For an update interval nfreq in Algorithm 1, we use the same value nfreq = 5 for all experiments. ϵT represents annealing time for exploration rate of ϵ-greedy, from 1.0 to 0.05. After some parametric studies, adjusting hyperparameter for VQ-VAE training such as ncd freq and nvq freq, instead of varying λ values listed as λvq, λcommit, and λcvr, provides more efficient way of searching parametric space.