A Deeper Understanding of State-Based Critics in Multi-Agent Reinforcement Learning

Authors: Xueguang Lyu, Andrea Baisero, Yuchen Xiao, Christopher Amato9396-9404

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we show the effects of the theories in practice by comparing different forms of centralized critics on a wide range of common benchmarks, and detail how various environmental properties are related to the effectiveness of different types of critics. ... Supported by a wide array of experiments, we also discuss the implications of our theories in practice. ... 5 Experiments To understand the performance of centralized critics in practice, we test state-based critics and history-based critics using vanilla Advantage Actor-Critic with a centralized critic.
Researcher Affiliation Academia Xueguang Lyu, Andrea Baisero, Yuchen Xiao, Christopher Amato Northeastern University {lu.xue, baisero.a, xiao.yuch, c.amato}@northeastern.edu
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement about making its source code available, nor does it provide a link to a code repository.
Open Datasets Yes The experiments were conducted on ... Dec-POMDP domain Dec-Tiger (Nair et al. 2003), Meeting-in-a-Grid domains (Bernstein, Hansen, and Zilberstein 2005; Amato, Dibangoye, and Zilberstein 2009), Find Treasure (Jiang 2019), Multi-agent Recycling (Amato, Bernstein, and Zilberstein 2007), Box Pushing (Seuken and Zilberstein 2007) and Cleaner (Jiang 2019), Star Craft Multi-Agent Challenge (SMAC) (Samvelyan et al. 2019).
Dataset Splits No The paper mentions 'Hyperparameters are individually tuned while fixing other hyperparameters' but does not specify how data was split for training, validation, and testing (e.g., percentages or counts) or explicitly state the use of a validation set.
Hardware Specification Yes The experiments were conducted on compute clusters with nodes equipped with Dual Intel Xeon E5-2650 CPUs and 128GB of RAM.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies used in the experiments (e.g., programming languages, libraries, frameworks).
Experiment Setup No The paper states 'Hyperparameters are individually tuned while fixing other hyperparameters' and describes the general approach ('vanilla Advantage Actor-Critic'), but it does not provide specific hyperparameter values (e.g., learning rate, batch size) or detailed system-level training configurations.