Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Cautiously-Optimistic Knowledge Sharing for Cooperative Multi-Agent Reinforcement Learning
Authors: Yanwen Ba, Xuan Liu, Xinning Chen, Hao Wang, Yang Xu, Kenli Li, Shigeng Zhang
AAAI 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate CONS in several challenging multi-agent tasks and find it excels in environments where optimal behavioral patterns are difficult to discover, surpassing the baselines in terms of convergence rate and final performance. |
| Researcher Affiliation | Academia | 1College of Computer Science and Electronic Engineering, Hunan University, Changsha, China 2School of Computer Science and Engineering, Central South University, Changsha, China |
| Pseudocode | Yes | Algorithm 1: Sample an action ai to be executed through targeted exploration. |
| Open Source Code | Yes | 1We provide open-source implementations of CONS in https://github.com/byw0919/CONS |
| Open Datasets | Yes | Cleanup (Yang et al. 2020) is a classic public goods game where agents can earn rewards by collecting apples whose growth rate is negatively correlated with the amount of waste in the river. |
| Dataset Splits | No | The paper describes simulation environments where data is generated through agent interaction rather than using a static dataset with explicit train/validation/test splits. |
| Hardware Specification | No | The paper mentions utilizing 'resources from the High Performance Computing Center of Central South University' but does not specify any exact hardware details such as GPU or CPU models, or memory. |
| Software Dependencies | No | The paper mentions using DRQN and DQN as underlying algorithms, but it does not specify any software versions for libraries, frameworks, or programming languages used in the implementation. |
| Experiment Setup | Yes | T is the temperature parameter used to adjust the randomness of decisions and we set it to 1. In the above equation, ei is the episode when knowledge sharing is initiated and a is an hyperparameter that used to adjust the descent rate of wn. Table 1: Two settings for Patient Gold Miner (PGM) environment. |