Improving Attention Mechanism in Graph Neural Networks via Cardinality Preservation

Authors: Shuo Zhang, Lei Xie

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on node and graph classification confirm our theoretical analysis and show the competitive performance of our CPA models.
Researcher Affiliation Academia Shuo Zhang1 , Lei Xie1,2,3 1Ph.D. Program in Computer Science, The Graduate Center, The City University of New York 2Department of Computer Science, Hunter College, The City University of New York 3Helen & Robert Appel Alzheimer s Disease Research Institute, Feil Family Brain & Mind Research Institute, Weill Cornell Medicine, Cornell University szhang4@gradcenter.cuny.edu, lei.xie@hunter.cuny.edu
Pseudocode No The paper describes models using mathematical equations but does not provide pseudocode or a clearly labeled algorithm block.
Open Source Code Yes The code is available online: https://github.com/zetayue/CPA.
Open Datasets Yes In our experiment on graph classification, we use 6 benchmark datasets collected by [Kersting et al., 2020]: 2 social network datasets (REDDIT-BINARY (RE-B), REDDITMULTI5K (RE-M5K)) and 4 bioinformatics datasets (MUTAG, PROTEINS, ENZYMES, NCI1).
Dataset Splits Yes For all experiments, we perform 10-fold cross-validation and repeat the experiments 10 times for each dataset and each model. Following [Xu et al., 2019], to get a final accuracy for each run, we select the epoch with the best cross-validation accuracy averaged over all 10 folds.
Hardware Specification No The paper does not explicitly describe the hardware used to run its experiments.
Software Dependencies No The paper mentions using Adam optimizer [Kingma and Ba, 2018], but does not provide specific version numbers for software dependencies.
Experiment Setup Yes For node classification, we use GAT [Veliˇckovi c et al., 2018] as the Original model. In the GAT variants, we use 2 GNN layers and a hidden dimensionality of 32. The negative input slope of Leaky Re LU in the GAT attention mechanism is 0.2. The number of heads in multi-head attention is 1. We use a dropout ratio of 0 and a weight decay value of 0. For graph classification, we build a GNN (GAT-GC) based on GAT as the Original model: We adopt the attention mechanism in GAT to specify the form of Equation (3). For the readout function, a naive way is to only consider the node embeddings from the last iteration. Although a sufficient number of iterations can help to avoid the cases in Theorem 1 by aggregating more diverse node features, the features from the latter iterations may generalize worse and the GNNs usually have shallow structures [Xu et al., 2019; Zhou et al., 2018]. So the GAT-GC adopts the same function as used in [Xu et al., 2018; Xu et al., 2019; Li et al., 2019], which concatenates graph embeddings from all iterations: h G = L k=0 Readout( hk i i G ) . For the Readout function, we use sum for bioinformatics datasets and mean for social network datasets. In the GAT-GC variants, we use 4 GNN layers. The hidden dimensionality is 32 for bioinformatics datasets and 64 for social network datasets. The negative input slope of Leaky Re LU is 0.2. We use a single head in the multi-head attention. The following hyper-parameters are tuned for each dataset: (1) Batch size in {32, 128}; (2) Dropout ratio in {0, 0.5} after dense layer; (3) L2 regularization from 0 to 0.001.