Entity Divider with Language Grounding in Multi-Agent Reinforcement Learning
Authors: Ziluo Ding, Wanpeng Zhang, Junpeng Yue, Xiangjun Wang, Tiejun Huang, Zongqing Lu
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, En Di demonstrates the strong generalization ability to unseen games with new dynamics and expresses the superiority over existing methods. The code is available at https://github.com/PKU-RL/En Di. |
| Researcher Affiliation | Collaboration | 1School of Computer Science, Peking University 2inspir.ai 3Beijing Academy of Artificial Intelligence. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/PKU-RL/En Di. |
| Open Datasets | Yes | We present two goal-based multi-agent environments based on two previous single-agent settings, i.e., MESSENGER (Hanjie et al., 2021) and RTFM (Zhong et al., 2019) |
| Dataset Splits | Yes | We use the validation games to save the model parameters with the highest validation win rate during training and use these parameters to evaluate the models on the test games. Note that the validation procedure follows the same settings of previous work (Hanjie et al., 2021). |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments. |
| Software Dependencies | No | The paper mentions software components like IMPALA, RMSProp, PPO, Adam optimizer, and BERT-base model, but does not provide specific version numbers for these or other ancillary software dependencies. |
| Experiment Setup | Yes | We train using an implementation of IMPALA (Espeholt et al., 2018). In particular, we use 10 actors and a batch size of 20. When unrolling actors, we use a maximum unroll length of 80 steps. Each episode lasts for a maximum of 1000 steps. We optimize using RMSProp with a learning rate of 0.005, which is annealed linearly for 100 million steps. We set α = 0.99 and ϵ = 0.01. |