Detecting Influence Structures in Multi-Agent Reinforcement Learning
Authors: Fabian Raoul Pieroth, Katherine Fitch, Lenz Belzner
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through empirical studies, we validate our approach s effectiveness in identifying intricate influence structures in complex interactions. Our work appears to be the first study of determining influence structures in the multi-agent average reward setting with convergence guarantees. |
| Researcher Affiliation | Academia | 1School of Computation, Science and Technology, Technical University of Munich, Germany 2Formerly, Chair of Operations Research, Technical University of Munich, Germany 3Technische Hochschule Ingolstadt, Germany. Correspondence to: Fabian R. Pieroth <fabian.pieroth@tum.de>. |
| Pseudocode | No | The paper refers to "Algorithm 1 of Zhang et al. (2018)" but does not include any pseudocode or algorithm blocks of its own methods within the document. |
| Open Source Code | No | The paper states it leverages "Stable Baselines3 (SB3)" and modifies the "DQN implementation of SB3" but does not provide an explicit statement or link to its own open-source code for the proposed methodology. |
| Open Datasets | No | The paper describes generating a "random multi-agent MDP" and using the "coin game" environment, but it does not provide concrete access information (link, DOI, formal citation) to a publicly available or open dataset that was used. |
| Dataset Splits | No | The paper discusses training and comparison with analytically determined values, but it does not provide specific details on training/validation/test dataset splits (e.g., percentages, sample counts, or explicit cross-validation setup). |
| Hardware Specification | Yes | All of our experiments specific to the coin game were conducted using a consumer-grade Nvidia Geforce RTX 2080Ti GPU. |
| Software Dependencies | No | The paper mentions using "Stable Baselines3 (SB3)" and the "PPO algorithm" and "DQN implementation" for the deep SARSA algorithm, but it does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | A summary of the used hyperparameters are given in Table 1. The experiments to evaluate the approximation algorithms for SIM and TIM have several tuneable hyperparameters. First, we determined the learnings rates α and β for the SARSA approximation algorithm (Sutton & Barto, 2018), the initial learning rates αSIM 0 and αTIM 0 , and the decay rates d SIM decay and d TIM decay of the TIM approximation algorithms. |