Graph Attention Networks

Authors: Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, Yoshua Bengio

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We have performed comparative evaluation of GAT models against a wide variety of strong baselines and previous approaches, on four established graph-based benchmark tasks (transductive as well as inductive), achieving or matching state-of-the-art performance across all of them.
Researcher Affiliation Collaboration Petar Veliˇckovi c Department of Computer Science and Technology University of Cambridge petar.velickovic@cst.cam.ac.uk; Guillem Cucurull Centre de Visi o per Computador, UAB gcucurull@gmail.com; Arantxa Casanova Centre de Visi o per Computador, UAB ar.casanova.8@gmail.com; Adriana Romero Montr eal Institute for Learning Algorithms Facebook AI Research adrianars@fb.com; Pietro Li o Department of Computer Science and Technology University of Cambridge pietro.lio@cst.cam.ac.uk; Yoshua Bengio Montr eal Institute for Learning Algorithms yoshua.umontreal@gmail.com
Pseudocode No The paper presents mathematical formulations of the GAT architecture but does not include a dedicated section or block explicitly labeled 'Pseudocode' or 'Algorithm'.
Open Source Code Yes Our implementation of the GAT layer may be found at: https://github.com/Petar V-/GAT.
Open Datasets Yes We utilize three standard citation network benchmark datasets Cora, Citeseer and Pubmed (Sen et al., 2008) and closely follow the transductive experimental setup of Yang et al. (2016). In all of these datasets, nodes correspond to documents and edges to (undirected) citations. ... We make use of a protein-protein interaction (PPI) dataset that consists of graphs corresponding to different human tissues (Zitnik & Leskovec, 2017).
Dataset Splits Yes The predictive power of the trained models is evaluated on 1000 test nodes, and we use 500 additional nodes for validation purposes (the same ones as used by Kipf & Welling (2017)). ... The dataset contains 20 graphs for training, 2 for validation and 2 for testing.
Hardware Specification No The paper acknowledges 'NVIDIA for the generous GPU support' but does not specify any particular GPU models or other hardware specifications used for the experiments.
Software Dependencies No The paper mentions 'Tensor Flow (Abadi et al., 2015)' but does not provide specific version numbers for TensorFlow or any other software libraries used.
Experiment Setup Yes The first layer consists of K = 8 attention heads computing F = 8 features each... During training, we apply L2 regularization with λ = 0.0005. Furthermore, dropout (Srivastava et al., 2014) with p = 0.6 is applied to both layers inputs... We utilize a batch size of 2 graphs during training. ...trained to minimize cross-entropy on the training nodes using the Adam SGD optimizer (Kingma & Ba, 2014) with an initial learning rate of 0.01 for Pubmed, and 0.005 for all other datasets. In both cases we use an early stopping strategy on both the cross-entropy loss and accuracy (transductive) or micro-F1 (inductive) score on the validation nodes, with a patience of 100 epochs.