MASER: Multi-Agent Reinforcement Learning with Subgoals Generated from Experience Replay Buffer
Authors: Jeewon Jeon, Woojun Kim, Whiyoung Jung, Youngchul Sung
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical results show that MASER significantly outperforms Star Craft II micromanagement benchmark compared to other state-of-the-art MARL algorithms. |
| Researcher Affiliation | Academia | 1School of Electrical Engineering, KAIST, Daejeon, South Korea. |
| Pseudocode | No | The paper describes its algorithm using prose and mathematical equations but does not include a formal pseudocode block or an 'Algorithm' section. |
| Open Source Code | Yes | The source code of the proposed algorithm is available at https://github.com/Jiwonjeon9603/MASER. |
| Open Datasets | Yes | To evaluate MASER, we considered the widely-used Star Craft II micromanagement benchmark (SMAC) environment. |
| Dataset Splits | No | The paper mentions using different random seeds for experiments and provides hyperparameters but does not explicitly describe data splits for training, validation, and testing (e.g., 80/10/10 split or specific counts). |
| Hardware Specification | Yes | Our code is based on Pytorch and we used NVIDIA-TITAN Xp. |
| Software Dependencies | No | The paper mentions software components like Pytorch, DRQN, GRU, and RMSProp, but it does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | The values of hyper-parameters are shown in Table 2. In MASER, the most recent 5000 episodes are stored in the replay buffer, and the mini-batch size is 32. MASER updates the target network every 200 episodes. We used RMSProp for the optimizer and the learning rate of the optimizer is 0.0005. The discounted factor of expected reward (i.e. return) is 0.99. And the value of epsilon for epsilon-greedy Q-learning starts at 1.0 and ends at 0.05 with 50000 anneal time. |