reproducibilityindex.ai

Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning

Authors: Hao-Lun Hsu, Weixin Wang, Miroslav Pajic, Pan Xu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our proposed method on multiple parallel RL environments, including a deep exploration problem (i.e., N-chain), a video game, and a real-world problem in energy systems. Our experimental results support that our framework can achieve better performance, even under conditions of misspecified transition models.
Researcher Affiliation	Academia	Hao-Lun Hsu , Weixin Wang , Miroslav Pajic, Pan Xu Duke University {hao-lun.hsu,weixin.wang,miroslav.pajic,pan.xu}@duke.edu
Pseudocode	Yes	A unified algorithm framework is presented in Algorithm 1, where each agent executes Least-Square Value Iteration (LSVI) in parallel and makes decisions based on collective data obtained from communication between each agent and the server.
Open Source Code	Yes	The implementation of this work can be found at https://github.com/panxulab/MARL-Coop TS
Open Datasets	Yes	We evaluate our proposed method on multiple parallel RL environments, including a deep exploration problem (i.e., N-chain), a video game, and a real-world problem in energy systems.
Dataset Splits	No	The paper does not explicitly specify training, validation, and test splits for the datasets. It describes episodic reinforcement learning settings but not data partitioning for supervised learning.
Hardware Specification	Yes	Note that we run all our experiments on Nvidia RTX A5000 with 24GB RAM.
Software Dependencies	No	The paper mentions software components like "deep Q-networks (DQNs)", "Adam SGLD", "PyTorch", and "Relu" but does not specify their version numbers.
Experiment Setup	Yes	We list the details of all swept hyper-parameters in N-chain for PHE and LMC in Table 2 and Table 3 respectively. Specifically, PHE is trained with reward noise ϵk,l,n h = 10 2 and regularizer noise ξk,n h = 10 3 in (3.5) and LMC is trained with βm,k = 102 and in (3.7) and optimized by Adam SGLD [33] with α1 = 0.9, α2 = 0.999 and bias factor α = 0.1. The final hyper-parameters used in N-chain are presented in Table 4. ... The detailed hyper-parameters for Super Mario Bros task are presented in Table 5. ... The hyper-parameters we used are in Table 6.