Shapley Q-Value: A Local Reward Approach to Solve Global Reward Games
Authors: Jianhong Wang, Yuan Zhang, Tae-Kyun Kim, Yunjie Gu7285-7292
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate SQDDPG on Cooperative Navigation, Prey-and Predator and Traffic Junction, compared with the state-of-the-art algorithms, e.g., MADDPG, COMA, Independent DDPG and Independent A2C. In the experiments, SQDDPG shows a significant improvement on the convergence rate. and We evaluate SQDDPG on Cooperative Navigation, Prey-and-Predator (Lowe et al. 2017) and Traffic Junction (Sukhbaatar, szlam, and Fergus 2016) 2. In the experiments, we compare SQDDPG with two Independent algorithms (with decentralised critics), e.g., Independent DDPG (IDDPG) (Lillicrap et al. 2015) and Independent A2C (IA2C) (Sutton and Barto 2018), and two state-of-the-art methods with centralised critics, e.g., MADDPG (Lowe et al. 2017) and COMA (Foerster et al. 2018). |
| Researcher Affiliation | Collaboration | Jianhong Wang,1,2 Yuan Zhang,3 Tae-Kyun Kim,2 Yunjie Gu1 1Control and Power Research Group, Imperial College London, UK 2Imperial Computer Vision and Learning Lab, Imperial College London, UK 3Laiye Network Technology Co.Ltd., China |
| Pseudocode | Yes | The pseudo code for the SQDDPG is given in Appendix. |
| Open Source Code | Yes | 2The code of experiments is available on: https://github.com/ hsvgbkhgbv/SQDDPG |
| Open Datasets | Yes | We evaluate SQDDPG on Cooperative Navigation, Prey-and-Predator (Lowe et al. 2017) and Traffic Junction (Sukhbaatar, szlam, and Fergus 2016). |
| Dataset Splits | No | The paper describes training and evaluation on environments (Cooperative Navigation, Prey-and-Predator, Traffic Junction) but does not explicitly specify train/validation/test dataset splits with percentages or sample counts, nor does it mention a dedicated validation set. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments. |
| Software Dependencies | No | The paper mentions software components and algorithms like 'Adam optimizer', 'DDPG method', and 'Gumbel-Softmax trick', but it does not provide specific version numbers for any programming languages, libraries, or frameworks used in the implementation. |
| Experiment Setup | Yes | The details of experimental setups are given in Appendix. |