Adaptive Structural Fingerprints for Graph Attention Networks

Authors: Kai Zhang, Yaokang Zhu, Jun Wang, Jie Zhang

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results demonstrate the power of our approach in exploiting rich structural information in GAT and in alleviating the intrinsic oversmoothing problem in graph neural networks. In this section, we report experimental results of the proposed method and state-of-the-art algorithms using graph-based benchmark data sets and transductive classification problem.
Researcher Affiliation Academia Kai Zhang Department of Computer & Information Sciences Temple University Philadelphia PA 19122, USA kzhang980@gmail.com Yaokang Zhu & Jun Wang School of Computer Science and Technology East Chine Normal University, Shanghai China 52184501026@stu.ecnu.edu.cn jwang@sei.ecnu.edu.cn Jie Zhang Institute of Brain-Inspired Intelligence Fudan University, Shanghai China jzhang080@gmail.com
Pseudocode No The paper describes the algorithm steps (Step 1, Step 2, Step 3, Step 4) and provides a workflow diagram (Figure 4) but does not include a formally labeled pseudocode or algorithm block.
Open Source Code Yes Our codes can be downloaded from the anonymous Github link http://github.com/Avigdor Z.
Open Datasets Yes We have selected three benchmark graph-structured data set from (Sen et al., 2008), namely Cora, Citeseer, and Pubmed. The three data sets are all citation networks. We split the data set into three parts: training, validation, and testing, as shown in table 1.
Dataset Splits Yes We split the data set into three parts: training, validation, and testing, as shown in table 1. Algorithm performance will be evaluated on the classification precision on the test split.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No Adam SGD is used for optimization, with learning rate λ = 5 1e 4. The paper does not mention specific software libraries or their version numbers, only the optimization algorithm.
Experiment Setup Yes Altogether two layers of message passing are adopted. In the first layer, one transformation matrix W Rd 8 is learned for each of altogether 8 attention heads; in the second layer, a transformation matrix W R64 C is used on the concatenated features (from the 8 attention head from the first layer), and one attention head is adopted followed by a softmax operator, where C is the number of classes. The number of parameters is 64(d + C). For the Pubmed data set, 8 attention heads are used in the second layer due to the larger graph size. Adam SGD is used for optimization, with learning rate λ = 5 1e 4. Both the fingerprint size and the attention range is chosen as 2-hop neighbors in our approach. The restart probability is simply chosen as c = 0.5.