Shapley Q-Value: A Local Reward Approach to Solve Global Reward Games

Authors: Jianhong Wang, Yuan Zhang, Tae-Kyun Kim, Yunjie Gu7285-7292

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate SQDDPG on Cooperative Navigation, Prey-and Predator and Traffic Junction, compared with the state-of-the-art algorithms, e.g., MADDPG, COMA, Independent DDPG and Independent A2C. In the experiments, SQDDPG shows a significant improvement on the convergence rate. and We evaluate SQDDPG on Cooperative Navigation, Prey-and-Predator (Lowe et al. 2017) and Traffic Junction (Sukhbaatar, szlam, and Fergus 2016) 2. In the experiments, we compare SQDDPG with two Independent algorithms (with decentralised critics), e.g., Independent DDPG (IDDPG) (Lillicrap et al. 2015) and Independent A2C (IA2C) (Sutton and Barto 2018), and two state-of-the-art methods with centralised critics, e.g., MADDPG (Lowe et al. 2017) and COMA (Foerster et al. 2018).
Researcher Affiliation Collaboration Jianhong Wang,1,2 Yuan Zhang,3 Tae-Kyun Kim,2 Yunjie Gu1 1Control and Power Research Group, Imperial College London, UK 2Imperial Computer Vision and Learning Lab, Imperial College London, UK 3Laiye Network Technology Co.Ltd., China
Pseudocode Yes The pseudo code for the SQDDPG is given in Appendix.
Open Source Code Yes 2The code of experiments is available on: https://github.com/ hsvgbkhgbv/SQDDPG
Open Datasets Yes We evaluate SQDDPG on Cooperative Navigation, Prey-and-Predator (Lowe et al. 2017) and Traffic Junction (Sukhbaatar, szlam, and Fergus 2016).
Dataset Splits No The paper describes training and evaluation on environments (Cooperative Navigation, Prey-and-Predator, Traffic Junction) but does not explicitly specify train/validation/test dataset splits with percentages or sample counts, nor does it mention a dedicated validation set.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments.
Software Dependencies No The paper mentions software components and algorithms like 'Adam optimizer', 'DDPG method', and 'Gumbel-Softmax trick', but it does not provide specific version numbers for any programming languages, libraries, or frameworks used in the implementation.
Experiment Setup Yes The details of experimental setups are given in Appendix.