Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Cautiously-Optimistic Knowledge Sharing for Cooperative Multi-Agent Reinforcement Learning

Authors: Yanwen Ba, Xuan Liu, Xinning Chen, Hao Wang, Yang Xu, Kenli Li, Shigeng Zhang

AAAI 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate CONS in several challenging multi-agent tasks and find it excels in environments where optimal behavioral patterns are difficult to discover, surpassing the baselines in terms of convergence rate and final performance.
Researcher Affiliation	Academia	1College of Computer Science and Electronic Engineering, Hunan University, Changsha, China 2School of Computer Science and Engineering, Central South University, Changsha, China
Pseudocode	Yes	Algorithm 1: Sample an action ai to be executed through targeted exploration.
Open Source Code	Yes	1We provide open-source implementations of CONS in https://github.com/byw0919/CONS
Open Datasets	Yes	Cleanup (Yang et al. 2020) is a classic public goods game where agents can earn rewards by collecting apples whose growth rate is negatively correlated with the amount of waste in the river.
Dataset Splits	No	The paper describes simulation environments where data is generated through agent interaction rather than using a static dataset with explicit train/validation/test splits.
Hardware Specification	No	The paper mentions utilizing 'resources from the High Performance Computing Center of Central South University' but does not specify any exact hardware details such as GPU or CPU models, or memory.
Software Dependencies	No	The paper mentions using DRQN and DQN as underlying algorithms, but it does not specify any software versions for libraries, frameworks, or programming languages used in the implementation.
Experiment Setup	Yes	T is the temperature parameter used to adjust the randomness of decisions and we set it to 1. In the above equation, ei is the episode when knowledge sharing is initiated and a is an hyperparameter that used to adjust the descent rate of wn. Table 1: Two settings for Patient Gold Miner (PGM) environment.