N$\text{A}^{\text{2}}$Q: Neural Attention Additive Model for Interpretable Multi-Agent Q-Learning

Authors: Zichuan Liu, Yuanyang Zhu, Chunlin Chen

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments show that NA2Q consistently achieves superior performance compared to different state-of-the-art methods on all challenging tasks, while yielding human-like interpretability.
Researcher Affiliation Academia Department of Control Science and Intelligence Engineering, Nanjing University, Nanjing, China. Correspondence to: Yuanyang Zhu <yuanyang@smail.nju.edu.cn>, Chunlin Chen <clchen@nju.edu.cn>.
Pseudocode Yes D. Pseudo Code: Algorithm 1 Neural Attention Additive Q-learning
Open Source Code Yes The source code is available at https://github.com/zichuan-liu/NA2Q.
Open Datasets Yes In this section, we demonstrate our experimental results of NA2Q on challenging tasks over LBF (Christianos et al., 2020) and SMAC (Samvelyan et al., 2019) benchmarks.
Dataset Splits No The paper specifies training steps, batch sizes, and test intervals, but it does not explicitly mention training/validation/test dataset splits or validation set details. Only 'train' and 'test' are clearly defined.
Hardware Specification Yes Experiments are performed on an NVIDIA RTX 3080Ti GPU and an Intel I9-12900k CPU.
Software Dependencies No The paper mentions optimizers like RMSprop and Adam, and network components like GRU and ReLU, but does not provide specific version numbers for software dependencies (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes In this paper, we utilize a recurrent style local Q-network with its default hyperparameters, specifically, the individual Q-function Qi(τi, ui) contains a GRU layer with a 64-dimensional hidden state and a Re LU activation layer. The optimization for individual Q-functions is conducted using RMSprop with weight decay and a learning rate of 0.0005. Regarding the generative model Gω, both encoder and decoder are comprised of two fully connected layers with a 32-dimensional hidden state, optimizing the learnable parameters by Adam with a learning rate of 0.0005. Additionally, we set the weight β of the loss to 0.1.