Cautiously-Optimistic Knowledge Sharing for Cooperative Multi-Agent Reinforcement Learning
Authors: Yanwen Ba, Xuan Liu, Xinning Chen, Hao Wang, Yang Xu, Kenli Li, Shigeng Zhang
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate CONS in several challenging multi-agent tasks and find it excels in environments where optimal behavioral patterns are difficult to discover, surpassing the baselines in terms of convergence rate and final performance. |
| Researcher Affiliation | Academia | 1College of Computer Science and Electronic Engineering, Hunan University, Changsha, China 2School of Computer Science and Engineering, Central South University, Changsha, China |
| Pseudocode | Yes | Algorithm 1: Sample an action ai to be executed through targeted exploration. |
| Open Source Code | Yes | 1We provide open-source implementations of CONS in https://github.com/byw0919/CONS |
| Open Datasets | Yes | Cleanup (Yang et al. 2020) is a classic public goods game where agents can earn rewards by collecting apples whose growth rate is negatively correlated with the amount of waste in the river. |
| Dataset Splits | No | The paper describes simulation environments where data is generated through agent interaction rather than using a static dataset with explicit train/validation/test splits. |
| Hardware Specification | No | The paper mentions utilizing 'resources from the High Performance Computing Center of Central South University' but does not specify any exact hardware details such as GPU or CPU models, or memory. |
| Software Dependencies | No | The paper mentions using DRQN and DQN as underlying algorithms, but it does not specify any software versions for libraries, frameworks, or programming languages used in the implementation. |
| Experiment Setup | Yes | T is the temperature parameter used to adjust the randomness of decisions and we set it to 1. In the above equation, ei is the episode when knowledge sharing is initiated and a is an hyperparameter that used to adjust the descent rate of wn. Table 1: Two settings for Patient Gold Miner (PGM) environment. |