reproducibilityindex.ai

N$\text{A}^{\text{2}}$Q: Neural Attention Additive Model for Interpretable Multi-Agent Q-Learning

Authors: Zichuan Liu, Yuanyang Zhu, Chunlin Chen

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that NA2Q consistently achieves superior performance compared to different state-of-the-art methods on all challenging tasks, while yielding human-like interpretability.
Researcher Affiliation	Academia	Department of Control Science and Intelligence Engineering, Nanjing University, Nanjing, China. Correspondence to: Yuanyang Zhu <yuanyang@smail.nju.edu.cn>, Chunlin Chen <clchen@nju.edu.cn>.
Pseudocode	Yes	D. Pseudo Code: Algorithm 1 Neural Attention Additive Q-learning
Open Source Code	Yes	The source code is available at https://github.com/zichuan-liu/NA2Q.
Open Datasets	Yes	In this section, we demonstrate our experimental results of NA2Q on challenging tasks over LBF (Christianos et al., 2020) and SMAC (Samvelyan et al., 2019) benchmarks.
Dataset Splits	No	The paper specifies training steps, batch sizes, and test intervals, but it does not explicitly mention training/validation/test dataset splits or validation set details. Only 'train' and 'test' are clearly defined.
Hardware Specification	Yes	Experiments are performed on an NVIDIA RTX 3080Ti GPU and an Intel I9-12900k CPU.
Software Dependencies	No	The paper mentions optimizers like RMSprop and Adam, and network components like GRU and ReLU, but does not provide specific version numbers for software dependencies (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	In this paper, we utilize a recurrent style local Q-network with its default hyperparameters, specifically, the individual Q-function Qi(τi, ui) contains a GRU layer with a 64-dimensional hidden state and a Re LU activation layer. The optimization for individual Q-functions is conducted using RMSprop with weight decay and a learning rate of 0.0005. Regarding the generative model Gω, both encoder and decoder are comprised of two fully connected layers with a 32-dimensional hidden state, optimizing the learnable parameters by Adam with a learning rate of 0.0005. Additionally, we set the weight β of the loss to 0.1.