reproducibilityindex.ai

Shapley Q-Value: A Local Reward Approach to Solve Global Reward Games

Authors: Jianhong Wang, Yuan Zhang, Tae-Kyun Kim, Yunjie Gu7285-7292

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate SQDDPG on Cooperative Navigation, Prey-and Predator and Trafﬁc Junction, compared with the state-of-the-art algorithms, e.g., MADDPG, COMA, Independent DDPG and Independent A2C. In the experiments, SQDDPG shows a signiﬁcant improvement on the convergence rate. and We evaluate SQDDPG on Cooperative Navigation, Prey-and-Predator (Lowe et al. 2017) and Trafﬁc Junction (Sukhbaatar, szlam, and Fergus 2016) 2. In the experiments, we compare SQDDPG with two Independent algorithms (with decentralised critics), e.g., Independent DDPG (IDDPG) (Lillicrap et al. 2015) and Independent A2C (IA2C) (Sutton and Barto 2018), and two state-of-the-art methods with centralised critics, e.g., MADDPG (Lowe et al. 2017) and COMA (Foerster et al. 2018).
Researcher Affiliation	Collaboration	Jianhong Wang,1,2 Yuan Zhang,3 Tae-Kyun Kim,2 Yunjie Gu1 1Control and Power Research Group, Imperial College London, UK 2Imperial Computer Vision and Learning Lab, Imperial College London, UK 3Laiye Network Technology Co.Ltd., China
Pseudocode	Yes	The pseudo code for the SQDDPG is given in Appendix.
Open Source Code	Yes	2The code of experiments is available on: https://github.com/ hsvgbkhgbv/SQDDPG
Open Datasets	Yes	We evaluate SQDDPG on Cooperative Navigation, Prey-and-Predator (Lowe et al. 2017) and Trafﬁc Junction (Sukhbaatar, szlam, and Fergus 2016).
Dataset Splits	No	The paper describes training and evaluation on environments (Cooperative Navigation, Prey-and-Predator, Traffic Junction) but does not explicitly specify train/validation/test dataset splits with percentages or sample counts, nor does it mention a dedicated validation set.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments.
Software Dependencies	No	The paper mentions software components and algorithms like 'Adam optimizer', 'DDPG method', and 'Gumbel-Softmax trick', but it does not provide specific version numbers for any programming languages, libraries, or frameworks used in the implementation.
Experiment Setup	Yes	The details of experimental setups are given in Appendix.