Improving Attention Mechanism in Graph Neural Networks via Cardinality Preservation
Authors: Shuo Zhang, Lei Xie
IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on node and graph classification confirm our theoretical analysis and show the competitive performance of our CPA models. |
| Researcher Affiliation | Academia | Shuo Zhang1 , Lei Xie1,2,3 1Ph.D. Program in Computer Science, The Graduate Center, The City University of New York 2Department of Computer Science, Hunter College, The City University of New York 3Helen & Robert Appel Alzheimer s Disease Research Institute, Feil Family Brain & Mind Research Institute, Weill Cornell Medicine, Cornell University szhang4@gradcenter.cuny.edu, lei.xie@hunter.cuny.edu |
| Pseudocode | No | The paper describes models using mathematical equations but does not provide pseudocode or a clearly labeled algorithm block. |
| Open Source Code | Yes | The code is available online: https://github.com/zetayue/CPA. |
| Open Datasets | Yes | In our experiment on graph classification, we use 6 benchmark datasets collected by [Kersting et al., 2020]: 2 social network datasets (REDDIT-BINARY (RE-B), REDDITMULTI5K (RE-M5K)) and 4 bioinformatics datasets (MUTAG, PROTEINS, ENZYMES, NCI1). |
| Dataset Splits | Yes | For all experiments, we perform 10-fold cross-validation and repeat the experiments 10 times for each dataset and each model. Following [Xu et al., 2019], to get a final accuracy for each run, we select the epoch with the best cross-validation accuracy averaged over all 10 folds. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments. |
| Software Dependencies | No | The paper mentions using Adam optimizer [Kingma and Ba, 2018], but does not provide specific version numbers for software dependencies. |
| Experiment Setup | Yes | For node classification, we use GAT [Veliˇckovi c et al., 2018] as the Original model. In the GAT variants, we use 2 GNN layers and a hidden dimensionality of 32. The negative input slope of Leaky Re LU in the GAT attention mechanism is 0.2. The number of heads in multi-head attention is 1. We use a dropout ratio of 0 and a weight decay value of 0. For graph classification, we build a GNN (GAT-GC) based on GAT as the Original model: We adopt the attention mechanism in GAT to specify the form of Equation (3). For the readout function, a naive way is to only consider the node embeddings from the last iteration. Although a sufficient number of iterations can help to avoid the cases in Theorem 1 by aggregating more diverse node features, the features from the latter iterations may generalize worse and the GNNs usually have shallow structures [Xu et al., 2019; Zhou et al., 2018]. So the GAT-GC adopts the same function as used in [Xu et al., 2018; Xu et al., 2019; Li et al., 2019], which concatenates graph embeddings from all iterations: h G = L k=0 Readout( hk i i G ) . For the Readout function, we use sum for bioinformatics datasets and mean for social network datasets. In the GAT-GC variants, we use 4 GNN layers. The hidden dimensionality is 32 for bioinformatics datasets and 64 for social network datasets. The negative input slope of Leaky Re LU is 0.2. We use a single head in the multi-head attention. The following hyper-parameters are tuned for each dataset: (1) Batch size in {32, 128}; (2) Dropout ratio in {0, 0.5} after dense layer; (3) L2 regularization from 0 to 0.001. |