LAGMA: LAtent Goal-guided Multi-Agent Reinforcement Learning
Authors: Hyungho Na, Il-Chul Moon
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The proposed method is evaluated by Star Craft II with both dense and sparse reward settings and Google Research Football. Empirical results show further performance improvement over state-of-the-art baselines. |
| Researcher Affiliation | Collaboration | 1Korea Advanced Institute of Science and Technology (KAIST), Daejeon 34141, Republic of Korea. 2summary.ai, Daejeon, Republic of Korea. |
| Pseudocode | Yes | Algorithm 1 Goal-reaching Trajectory and Intrinsic Reward Generation. Algorithm 2 Update Sequence Buffer Dseq. Algorithm 3 Compute J (t). Algorithm 4 Training algorithm for VQ-VAE and DV Q. Algorithm 5 Training algorithm for LAGMA. |
| Open Source Code | Yes | Our code is available at: https://github.com/aailabkaist/LAGMA. |
| Open Datasets | Yes | We consider complex multi-agent tasks such as SMAC (Samvelyan et al., 2019) and GRF (Kurach et al., 2020) as benchmark problems. |
| Dataset Splits | No | The paper mentions training and testing on environments like SMAC and GRF but does not explicitly state training/validation/test dataset splits using specific percentages or counts for a validation set. |
| Hardware Specification | Yes | Ge Force RTX3090 is used for 5m vs 6m and Ge Force RTX4090 for 8m(sparse) and MMM2. |
| Software Dependencies | No | The paper describes the algorithms and frameworks used (e.g., QMIX), but it does not explicitly list specific software dependencies with their version numbers (e.g., Python 3.x, PyTorch 1.x, or specific library versions). |
| Experiment Setup | Yes | Table 3: Hyperparameter settings for experiments. For an update interval nfreq in Algorithm 1, we use the same value nfreq = 5 for all experiments. ϵT represents annealing time for exploration rate of ϵ-greedy, from 1.0 to 0.05. After some parametric studies, adjusting hyperparameter for VQ-VAE training such as ncd freq and nvq freq, instead of varying λ values listed as λvq, λcommit, and λcvr, provides more efficient way of searching parametric space. |