Entity Divider with Language Grounding in Multi-Agent Reinforcement Learning

Authors: Ziluo Ding, Wanpeng Zhang, Junpeng Yue, Xiangjun Wang, Tiejun Huang, Zongqing Lu

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, En Di demonstrates the strong generalization ability to unseen games with new dynamics and expresses the superiority over existing methods. The code is available at https://github.com/PKU-RL/En Di.
Researcher Affiliation Collaboration 1School of Computer Science, Peking University 2inspir.ai 3Beijing Academy of Artificial Intelligence.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes The code is available at https://github.com/PKU-RL/En Di.
Open Datasets Yes We present two goal-based multi-agent environments based on two previous single-agent settings, i.e., MESSENGER (Hanjie et al., 2021) and RTFM (Zhong et al., 2019)
Dataset Splits Yes We use the validation games to save the model parameters with the highest validation win rate during training and use these parameters to evaluate the models on the test games. Note that the validation procedure follows the same settings of previous work (Hanjie et al., 2021).
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies No The paper mentions software components like IMPALA, RMSProp, PPO, Adam optimizer, and BERT-base model, but does not provide specific version numbers for these or other ancillary software dependencies.
Experiment Setup Yes We train using an implementation of IMPALA (Espeholt et al., 2018). In particular, we use 10 actors and a batch size of 20. When unrolling actors, we use a maximum unroll length of 80 steps. Each episode lasts for a maximum of 1000 steps. We optimize using RMSProp with a learning rate of 0.005, which is annealed linearly for 100 million steps. We set α = 0.99 and ϵ = 0.01.