N$\text{A}^{\text{2}}$Q: Neural Attention Additive Model for Interpretable Multi-Agent Q-Learning
Authors: Zichuan Liu, Yuanyang Zhu, Chunlin Chen
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments show that NA2Q consistently achieves superior performance compared to different state-of-the-art methods on all challenging tasks, while yielding human-like interpretability. |
| Researcher Affiliation | Academia | Department of Control Science and Intelligence Engineering, Nanjing University, Nanjing, China. Correspondence to: Yuanyang Zhu <yuanyang@smail.nju.edu.cn>, Chunlin Chen <clchen@nju.edu.cn>. |
| Pseudocode | Yes | D. Pseudo Code: Algorithm 1 Neural Attention Additive Q-learning |
| Open Source Code | Yes | The source code is available at https://github.com/zichuan-liu/NA2Q. |
| Open Datasets | Yes | In this section, we demonstrate our experimental results of NA2Q on challenging tasks over LBF (Christianos et al., 2020) and SMAC (Samvelyan et al., 2019) benchmarks. |
| Dataset Splits | No | The paper specifies training steps, batch sizes, and test intervals, but it does not explicitly mention training/validation/test dataset splits or validation set details. Only 'train' and 'test' are clearly defined. |
| Hardware Specification | Yes | Experiments are performed on an NVIDIA RTX 3080Ti GPU and an Intel I9-12900k CPU. |
| Software Dependencies | No | The paper mentions optimizers like RMSprop and Adam, and network components like GRU and ReLU, but does not provide specific version numbers for software dependencies (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | In this paper, we utilize a recurrent style local Q-network with its default hyperparameters, specifically, the individual Q-function Qi(τi, ui) contains a GRU layer with a 64-dimensional hidden state and a Re LU activation layer. The optimization for individual Q-functions is conducted using RMSprop with weight decay and a learning rate of 0.0005. Regarding the generative model Gω, both encoder and decoder are comprised of two fully connected layers with a 32-dimensional hidden state, optimizing the learnable parameters by Adam with a learning rate of 0.0005. Additionally, we set the weight β of the loss to 0.1. |