Understanding Attention and Generalization in Graph Neural Networks
Authors: Boris Knyazev, Graham W. Taylor, Mohamed Amer
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We design simple graph reasoning tasks that allow us to study attention in a controlled environment. We find that under typical conditions the effect of attention is negligible or even harmful, but under certain conditions it provides an exceptional gain in performance... We validate the effectiveness of this scheme on our synthetic datasets, as well as on MNIST and on real graph classification benchmarks... |
| Researcher Affiliation | Collaboration | Boris Knyazev University of Guelph Vector Institute bknyazev@uoguelph.ca Graham W. Taylor University of Guelph Vector Institute, Canada CIFAR AI Chair gwtaylor@uoguelph.ca Mohamed R. Amer Robust.AI mohamed@robust.ai |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | Yes | Source code and datasets are available at https://github. com/bknyaz/graph_attention_pool. |
| Open Datasets | Yes | We also experiment with MNIST images [13] and three molecule and social datasets... namely COLLAB [14, 15], PROTEINS [16], and D&D [17]. |
| Dataset Splits | Yes | For synthetic datasets, we tune them on a validation set generated in the same way as TEST-ORIG. For MNIST-75SP, we use part of the training set. For COLLAB, PROTEINS and D&D, we tune them using 10-fold cross-validation on the training set. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments were mentioned in the paper. |
| Software Dependencies | No | We train all models with Adam [26], learning rate 1e-3, batch size 32, weight decay 1e-4 (see the Supp. Material for details). |
| Experiment Setup | Yes | We build 2 layer GNNs for COLORS and 3 layer GNNs for other tasks with 64 filters in each layer, except for MNIST-75SP where we have more filters. ... We train all models with Adam [26], learning rate 1e-3, batch size 32, weight decay 1e-4 (see the Supp. Material for details). |