A Formal Model for Multiagent Q-Learning Dynamics on Regular Graphs

Authors: Chen Chu, Yong Li, Jinzhuo Liu, Shuyue Hu, Xuelong Li, Zhen Wang

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through comparisons with agent-based simulations on different types of regular graphs, we show that our model describes the agent learning dynamics in an exact manner. In the experiments, we consider two specific types of regular graphs: translational symmetric lattice and random regular graphs. We validate our model by showing that the Qlearning dynamics predicted by our model match the actual dynamics in agent-based simulations across different games and different regular graphs.
Researcher Affiliation Collaboration Chen Chu1,2 , Yong Li3 , Jinzhuo Liu2,3 , Shuyue Hu4 , Xuelong Li2 and Zhen Wang2,5 1School of Statistics and Mathematics, Yunnan University of Finance and Economics 2School of Artificial Intelligence, OPtics and Electro Nics (i OPEN), Northwestern Polytechnical University 3School of Software, Yunnan University 4Shanghai Artificial Intelligence Laboratory 5School of Cybersecurity, Northwestern Polytechnical University
Pseudocode Yes Algorithm 1 A Multiagent Q-Learning Framework on Regular Graphs
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets No The paper describes 'Game Configurations' including Prisoner's Dilemma (PD), Hawk Dove (HD), and Common Interest (CI) games. These are conceptual game setups, not publicly available datasets in the traditional sense, and no access information is provided.
Dataset Splits No The paper does not provide specific dataset split information (percentages, sample counts, or methodology) needed to reproduce data partitioning for training, validation, or testing.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes The exploration temperature β is set to 2 and the learning rate α is set to 0.4. Unless otherwise specified, we set the initial Q-values to zero for all agents which means that agents take their actions randomly.