Solving Homogeneous and Heterogeneous Cooperative Tasks with Greedy Sequential Execution

Authors: Shanqi Liu, Dong Xing, Pengjie Gu, Xinrun Wang, Bo An, Yong Liu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluated GSE in both homogeneous and heterogeneous scenarios. The results demonstrate that GSE achieves significant improvement in performance across multiple domains, especially in scenarios involving both homogeneous and heterogeneous tasks.
Researcher Affiliation Collaboration Shanqi Liu1, Dong Xing1, Pengjie Gu2, Xinrun Wang2 , Bo An2,3 , Yong Liu1 1Zhejiang University 2Nanyang Technological University 3Skywork AI, Singapore
Pseudocode No The paper describes the proposed method and calculations (e.g., marginal contribution) in text and equations, but it does not include a formally labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code No The paper does not include any explicit statement about releasing open-source code for the described methodology or provide a link to a code repository.
Open Datasets Yes The experiments are conducted based on MAgent (Zheng et al., 2018) and Overcooked (Sarkar et al., 2022).
Dataset Splits No The paper does not explicitly provide details about train/validation/test dataset splits (e.g., percentages or sample counts). It mentions training parameters but not data partitioning for evaluation.
Hardware Specification Yes All experiments are carried out on the same computer, equipped with Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz, 64GB RAM and an NVIDIA RTX3090.
Software Dependencies No The paper mentions 'the framework is Py Torch' but does not specify the version number for PyTorch or any other software dependencies, which is required for reproducibility.
Experiment Setup Yes We set the discount factor as 0.99 and use the RMSprop optimizer with a learning rate of 5e-4 for policy and 1e-3 for the critic. The ϵ-greedy is used for exploration with ϵ annealed linearly from 1.0 to 0.05 in 700k steps. The batch size is 4 and updating the target network every 200 episodes. The length of each episode in MAgent is limited to 100 steps in bridge and 50 for others, except for Multi-XOR which is a single-step game. The sample number M of our method is 5 in all scenarios.