Decentralized Q-learning in Zero-sum Markov Games
Authors: Muhammed Sayin, Kaiqing Zhang, David Leslie, Tamer Basar, Asuman Ozdaglar
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We also verify the convergence of the learning dynamics via numerical examples. All the simulations are executed on a desktop computer equipped with a 3.7 GHz Hexa-Core Intel Core i7-8700K processor with Matlab R2019b. The device also has two 8GB 3000MHz DDR4 memories and a NVIDIA Ge Force GTX 1080 8GB GDDR5X graphic card. For illustration, we consider a zero-sum Markov game with 5 states and 3 actions at each state, i.e., S = {1, 2, , 5} and Ai s = {1, 2, 3}. |
| Researcher Affiliation | Academia | Muhammed O. Sayin Bilkent University sayin@ee.bilkent.edu.tr Kaiqing Zhang MIT kaiqing@mit.edu David S. Leslie Lancaster University d.leslie@lancaster.ac.uk Tamer Ba sar UIUC basar1@illinois.edu Asuman Ozdaglar MIT asuman@mit.edu |
| Pseudocode | Yes | Table 1: Decentralized Q-learning dynamics in Markov games |
| Open Source Code | Yes | 3. If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Yes] See the supplementary material. |
| Open Datasets | No | For illustration, we consider a zero-sum Markov game with 5 states and 3 actions at each state, i.e., S = {1, 2, , 5} and Ai s = {1, 2, 3}. The discount factor γ = 0.6. The reward functions are chosen randomly in a way that r1 s(a1, a2) rs,a1,a2 exp (s2) for s S, where rs,a1,a2 is uniformly drawn from [ 1, 1]. |
| Dataset Splits | No | 3. If you ran experiments... (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [N/A] There is no training for the algorithm. |
| Hardware Specification | Yes | All the simulations are executed on a desktop computer equipped with a 3.7 GHz Hexa-Core Intel Core i7-8700K processor with Matlab R2019b. The device also has two 8GB 3000MHz DDR4 memories and a NVIDIA Ge Force GTX 1080 8GB GDDR5X graphic card. |
| Software Dependencies | Yes | Matlab R2019b |
| Experiment Setup | Yes | The discount factor γ = 0.6. For both cases, we choose αc = 1/c0.9 and βc = 1/c with ρα = 0.9, ρβ = 1, and ρ = 0.7, and set τc in accordance with (11) and (12), respectively. For Case 1, we choose ϵ = 2 10 4 and τ = 4.5 104; for Case 2, we choose τ = 0.07. |