SPAGAN: Shortest Path Graph Attention Network
Authors: Yiding Yang, Xinchao Wang, Mingli Song, Junsong Yuan, Dacheng Tao
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test SPAGAN on the downstream classification task on several standard datasets, and achieve performances superior to the state of the art. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Stevens Institute of Technology 2College of Computer Science and Technology, Zhejiang University 3Department of Computer Science and Engineering, State University of New York at Buffalo 4UBTECH Sydney Artifical Intelligence Centre, University of Sydney {yyang99, xwang135}@stevens.edu, brooksong@zju.edu.cn, jsyuan@buffalo.edu, dacheng.tao@sydney.edu.au. |
| Pseudocode | No | The paper describes the proposed method step-by-step in prose, but it does not include a structured pseudocode or algorithm block. |
| Open Source Code | No | The paper does not provide any statement about releasing source code or a link to a code repository for their proposed method. |
| Open Datasets | Yes | We use three widely used semi-supervised graph datasets, Cora, Citeseer and Pubmed summarized in Tab. 1. |
| Dataset Splits | Yes | Following the work of [Kipf and Welling, 2016; Veliˇckovi c et al., 2018], for each dataset, we only use 20 nodes per class for training, 500 nodes for validating and 1000 nodes for testing. |
| Hardware Specification | Yes | The running time of one epoch with path attention on Pubmed dataset is 0.1s on a Nvidia 1080Ti GPU. |
| Software Dependencies | No | We implement SPAGAN under Pytorch framework [Paszke et al., 2017] and train it with Adam optimizer. While specific frameworks/optimizers are mentioned, no specific version numbers for PyTorch or other libraries are provided. |
| Experiment Setup | Yes | For the Cora dataset, we set the learning rate to 0.005 and the weight of L2 regularization to 0.0005; for the Pubmed dataset, we set the learning rate to 0.01 and the weight of L2 regularization to 0.001. For the Citeseer dataset... we set the learning rate to 0.0085 and the weight of L2 regularization to 0.002. For all datasets, we set a tolerance window and stop the training process if there is no lower validation loss within it. We use two graph convolutional layers for all datasets with different attention heads. For the first layer, 8 attention heads for each c is used. Each attention head will compute 8 features. Then, an ELU [Clevert et al., 2015] function is applied. In the second layer, we use 8 attention heads for the Pumbed dataset and 1 attention head for the other two datasets. Dropout is applied to the input of each layer and also to the attention coefficients for each node, with a keep probability of 0.4. For all datasets, we set the r to 1.0... The max value of c is set to be three for the first layer and two for the last layer for all datasets. The steps of iteration is set to two. For all the datasets, we use early stopping based on the cross-entropy loss on validation set. |