A Deeper Understanding of State-Based Critics in Multi-Agent Reinforcement Learning
Authors: Xueguang Lyu, Andrea Baisero, Yuchen Xiao, Christopher Amato9396-9404
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we show the effects of the theories in practice by comparing different forms of centralized critics on a wide range of common benchmarks, and detail how various environmental properties are related to the effectiveness of different types of critics. ... Supported by a wide array of experiments, we also discuss the implications of our theories in practice. ... 5 Experiments To understand the performance of centralized critics in practice, we test state-based critics and history-based critics using vanilla Advantage Actor-Critic with a centralized critic. |
| Researcher Affiliation | Academia | Xueguang Lyu, Andrea Baisero, Yuchen Xiao, Christopher Amato Northeastern University {lu.xue, baisero.a, xiao.yuch, c.amato}@northeastern.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statement about making its source code available, nor does it provide a link to a code repository. |
| Open Datasets | Yes | The experiments were conducted on ... Dec-POMDP domain Dec-Tiger (Nair et al. 2003), Meeting-in-a-Grid domains (Bernstein, Hansen, and Zilberstein 2005; Amato, Dibangoye, and Zilberstein 2009), Find Treasure (Jiang 2019), Multi-agent Recycling (Amato, Bernstein, and Zilberstein 2007), Box Pushing (Seuken and Zilberstein 2007) and Cleaner (Jiang 2019), Star Craft Multi-Agent Challenge (SMAC) (Samvelyan et al. 2019). |
| Dataset Splits | No | The paper mentions 'Hyperparameters are individually tuned while fixing other hyperparameters' but does not specify how data was split for training, validation, and testing (e.g., percentages or counts) or explicitly state the use of a validation set. |
| Hardware Specification | Yes | The experiments were conducted on compute clusters with nodes equipped with Dual Intel Xeon E5-2650 CPUs and 128GB of RAM. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies used in the experiments (e.g., programming languages, libraries, frameworks). |
| Experiment Setup | No | The paper states 'Hyperparameters are individually tuned while fixing other hyperparameters' and describes the general approach ('vanilla Advantage Actor-Critic'), but it does not provide specific hyperparameter values (e.g., learning rate, batch size) or detailed system-level training configurations. |