Understanding Attention and Generalization in Graph Neural Networks

Authors: Boris Knyazev, Graham W. Taylor, Mohamed Amer

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We design simple graph reasoning tasks that allow us to study attention in a controlled environment. We find that under typical conditions the effect of attention is negligible or even harmful, but under certain conditions it provides an exceptional gain in performance... We validate the effectiveness of this scheme on our synthetic datasets, as well as on MNIST and on real graph classification benchmarks...
Researcher Affiliation Collaboration Boris Knyazev University of Guelph Vector Institute bknyazev@uoguelph.ca Graham W. Taylor University of Guelph Vector Institute, Canada CIFAR AI Chair gwtaylor@uoguelph.ca Mohamed R. Amer Robust.AI mohamed@robust.ai
Pseudocode No No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code Yes Source code and datasets are available at https://github. com/bknyaz/graph_attention_pool.
Open Datasets Yes We also experiment with MNIST images [13] and three molecule and social datasets... namely COLLAB [14, 15], PROTEINS [16], and D&D [17].
Dataset Splits Yes For synthetic datasets, we tune them on a validation set generated in the same way as TEST-ORIG. For MNIST-75SP, we use part of the training set. For COLLAB, PROTEINS and D&D, we tune them using 10-fold cross-validation on the training set.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments were mentioned in the paper.
Software Dependencies No We train all models with Adam [26], learning rate 1e-3, batch size 32, weight decay 1e-4 (see the Supp. Material for details).
Experiment Setup Yes We build 2 layer GNNs for COLORS and 3 layer GNNs for other tasks with 64 filters in each layer, except for MNIST-75SP where we have more filters. ... We train all models with Adam [26], learning rate 1e-3, batch size 32, weight decay 1e-4 (see the Supp. Material for details).