reproducibilityindex.ai

Decentralized Q-learning in Zero-sum Markov Games

Authors: Muhammed Sayin, Kaiqing Zhang, David Leslie, Tamer Basar, Asuman Ozdaglar

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also verify the convergence of the learning dynamics via numerical examples. All the simulations are executed on a desktop computer equipped with a 3.7 GHz Hexa-Core Intel Core i7-8700K processor with Matlab R2019b. The device also has two 8GB 3000MHz DDR4 memories and a NVIDIA Ge Force GTX 1080 8GB GDDR5X graphic card. For illustration, we consider a zero-sum Markov game with 5 states and 3 actions at each state, i.e., S = {1, 2, , 5} and Ai s = {1, 2, 3}.
Researcher Affiliation	Academia	Muhammed O. Sayin Bilkent University sayin@ee.bilkent.edu.tr Kaiqing Zhang MIT kaiqing@mit.edu David S. Leslie Lancaster University d.leslie@lancaster.ac.uk Tamer Ba sar UIUC basar1@illinois.edu Asuman Ozdaglar MIT asuman@mit.edu
Pseudocode	Yes	Table 1: Decentralized Q-learning dynamics in Markov games
Open Source Code	Yes	3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See the supplementary material.
Open Datasets	No	For illustration, we consider a zero-sum Markov game with 5 states and 3 actions at each state, i.e., S = {1, 2, , 5} and Ai s = {1, 2, 3}. The discount factor γ = 0.6. The reward functions are chosen randomly in a way that r1 s(a1, a2) rs,a1,a2 exp (s2) for s S, where rs,a1,a2 is uniformly drawn from [ 1, 1].
Dataset Splits	No	3. If you ran experiments... (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [N/A] There is no training for the algorithm.
Hardware Specification	Yes	All the simulations are executed on a desktop computer equipped with a 3.7 GHz Hexa-Core Intel Core i7-8700K processor with Matlab R2019b. The device also has two 8GB 3000MHz DDR4 memories and a NVIDIA Ge Force GTX 1080 8GB GDDR5X graphic card.
Software Dependencies	Yes	Matlab R2019b
Experiment Setup	Yes	The discount factor γ = 0.6. For both cases, we choose αc = 1/c0.9 and βc = 1/c with ρα = 0.9, ρβ = 1, and ρ = 0.7, and set τc in accordance with (11) and (12), respectively. For Case 1, we choose ϵ = 2 10 4 and τ = 4.5 104; for Case 2, we choose τ = 0.07.