Solving Homogeneous and Heterogeneous Cooperative Tasks with Greedy Sequential Execution
Authors: Shanqi Liu, Dong Xing, Pengjie Gu, Xinrun Wang, Bo An, Yong Liu
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated GSE in both homogeneous and heterogeneous scenarios. The results demonstrate that GSE achieves significant improvement in performance across multiple domains, especially in scenarios involving both homogeneous and heterogeneous tasks. |
| Researcher Affiliation | Collaboration | Shanqi Liu1, Dong Xing1, Pengjie Gu2, Xinrun Wang2 , Bo An2,3 , Yong Liu1 1Zhejiang University 2Nanyang Technological University 3Skywork AI, Singapore |
| Pseudocode | No | The paper describes the proposed method and calculations (e.g., marginal contribution) in text and equations, but it does not include a formally labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | No | The paper does not include any explicit statement about releasing open-source code for the described methodology or provide a link to a code repository. |
| Open Datasets | Yes | The experiments are conducted based on MAgent (Zheng et al., 2018) and Overcooked (Sarkar et al., 2022). |
| Dataset Splits | No | The paper does not explicitly provide details about train/validation/test dataset splits (e.g., percentages or sample counts). It mentions training parameters but not data partitioning for evaluation. |
| Hardware Specification | Yes | All experiments are carried out on the same computer, equipped with Intel(R) Xeon(R) Gold 5218R CPU @ 2.10GHz, 64GB RAM and an NVIDIA RTX3090. |
| Software Dependencies | No | The paper mentions 'the framework is Py Torch' but does not specify the version number for PyTorch or any other software dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | We set the discount factor as 0.99 and use the RMSprop optimizer with a learning rate of 5e-4 for policy and 1e-3 for the critic. The ϵ-greedy is used for exploration with ϵ annealed linearly from 1.0 to 0.05 in 700k steps. The batch size is 4 and updating the target network every 200 episodes. The length of each episode in MAgent is limited to 100 steps in bridge and 50 for others, except for Multi-XOR which is a single-step game. The sample number M of our method is 5 in all scenarios. |