Detecting Influence Structures in Multi-Agent Reinforcement Learning

Authors: Fabian Raoul Pieroth, Katherine Fitch, Lenz Belzner

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through empirical studies, we validate our approach s effectiveness in identifying intricate influence structures in complex interactions. Our work appears to be the first study of determining influence structures in the multi-agent average reward setting with convergence guarantees.
Researcher Affiliation Academia 1School of Computation, Science and Technology, Technical University of Munich, Germany 2Formerly, Chair of Operations Research, Technical University of Munich, Germany 3Technische Hochschule Ingolstadt, Germany. Correspondence to: Fabian R. Pieroth <fabian.pieroth@tum.de>.
Pseudocode No The paper refers to "Algorithm 1 of Zhang et al. (2018)" but does not include any pseudocode or algorithm blocks of its own methods within the document.
Open Source Code No The paper states it leverages "Stable Baselines3 (SB3)" and modifies the "DQN implementation of SB3" but does not provide an explicit statement or link to its own open-source code for the proposed methodology.
Open Datasets No The paper describes generating a "random multi-agent MDP" and using the "coin game" environment, but it does not provide concrete access information (link, DOI, formal citation) to a publicly available or open dataset that was used.
Dataset Splits No The paper discusses training and comparison with analytically determined values, but it does not provide specific details on training/validation/test dataset splits (e.g., percentages, sample counts, or explicit cross-validation setup).
Hardware Specification Yes All of our experiments specific to the coin game were conducted using a consumer-grade Nvidia Geforce RTX 2080Ti GPU.
Software Dependencies No The paper mentions using "Stable Baselines3 (SB3)" and the "PPO algorithm" and "DQN implementation" for the deep SARSA algorithm, but it does not provide specific version numbers for these software components.
Experiment Setup Yes A summary of the used hyperparameters are given in Table 1. The experiments to evaluate the approximation algorithms for SIM and TIM have several tuneable hyperparameters. First, we determined the learnings rates α and β for the SARSA approximation algorithm (Sutton & Barto, 2018), the initial learning rates αSIM 0 and αTIM 0 , and the decay rates d SIM decay and d TIM decay of the TIM approximation algorithms.