Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

A Deeper Understanding of State-Based Critics in Multi-Agent Reinforcement Learning

Authors: Xueguang Lyu, Andrea Baisero, Yuchen Xiao, Christopher Amato9396-9404

AAAI 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we show the effects of the theories in practice by comparing different forms of centralized critics on a wide range of common benchmarks, and detail how various environmental properties are related to the effectiveness of different types of critics. ... Supported by a wide array of experiments, we also discuss the implications of our theories in practice. ... 5 Experiments To understand the performance of centralized critics in practice, we test state-based critics and history-based critics using vanilla Advantage Actor-Critic with a centralized critic.
Researcher Affiliation	Academia	Xueguang Lyu, Andrea Baisero, Yuchen Xiao, Christopher Amato Northeastern University EMAIL
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statement about making its source code available, nor does it provide a link to a code repository.
Open Datasets	Yes	The experiments were conducted on ... Dec-POMDP domain Dec-Tiger (Nair et al. 2003), Meeting-in-a-Grid domains (Bernstein, Hansen, and Zilberstein 2005; Amato, Dibangoye, and Zilberstein 2009), Find Treasure (Jiang 2019), Multi-agent Recycling (Amato, Bernstein, and Zilberstein 2007), Box Pushing (Seuken and Zilberstein 2007) and Cleaner (Jiang 2019), Star Craft Multi-Agent Challenge (SMAC) (Samvelyan et al. 2019).
Dataset Splits	No	The paper mentions 'Hyperparameters are individually tuned while fixing other hyperparameters' but does not specify how data was split for training, validation, and testing (e.g., percentages or counts) or explicitly state the use of a validation set.
Hardware Specification	Yes	The experiments were conducted on compute clusters with nodes equipped with Dual Intel Xeon E5-2650 CPUs and 128GB of RAM.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies used in the experiments (e.g., programming languages, libraries, frameworks).
Experiment Setup	No	The paper states 'Hyperparameters are individually tuned while fixing other hyperparameters' and describes the general approach ('vanilla Advantage Actor-Critic'), but it does not provide specific hyperparameter values (e.g., learning rate, batch size) or detailed system-level training configurations.