MASER: Multi-Agent Reinforcement Learning with Subgoals Generated from Experience Replay Buffer

Authors: Jeewon Jeon, Woojun Kim, Whiyoung Jung, Youngchul Sung

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical results show that MASER significantly outperforms Star Craft II micromanagement benchmark compared to other state-of-the-art MARL algorithms.
Researcher Affiliation Academia 1School of Electrical Engineering, KAIST, Daejeon, South Korea.
Pseudocode No The paper describes its algorithm using prose and mathematical equations but does not include a formal pseudocode block or an 'Algorithm' section.
Open Source Code Yes The source code of the proposed algorithm is available at https://github.com/Jiwonjeon9603/MASER.
Open Datasets Yes To evaluate MASER, we considered the widely-used Star Craft II micromanagement benchmark (SMAC) environment.
Dataset Splits No The paper mentions using different random seeds for experiments and provides hyperparameters but does not explicitly describe data splits for training, validation, and testing (e.g., 80/10/10 split or specific counts).
Hardware Specification Yes Our code is based on Pytorch and we used NVIDIA-TITAN Xp.
Software Dependencies No The paper mentions software components like Pytorch, DRQN, GRU, and RMSProp, but it does not provide specific version numbers for any of them.
Experiment Setup Yes The values of hyper-parameters are shown in Table 2. In MASER, the most recent 5000 episodes are stored in the replay buffer, and the mini-batch size is 32. MASER updates the target network every 200 episodes. We used RMSProp for the optimizer and the learning rate of the optimizer is 0.0005. The discounted factor of expected reward (i.e. return) is 0.99. And the value of epsilon for epsilon-greedy Q-learning starts at 1.0 and ends at 0.05 with 50000 anneal time.