SPAGAN: Shortest Path Graph Attention Network

Authors: Yiding Yang, Xinchao Wang, Mingli Song, Junsong Yuan, Dacheng Tao

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test SPAGAN on the downstream classification task on several standard datasets, and achieve performances superior to the state of the art.
Researcher Affiliation Academia 1Department of Computer Science, Stevens Institute of Technology 2College of Computer Science and Technology, Zhejiang University 3Department of Computer Science and Engineering, State University of New York at Buffalo 4UBTECH Sydney Artifical Intelligence Centre, University of Sydney {yyang99, xwang135}@stevens.edu, brooksong@zju.edu.cn, jsyuan@buffalo.edu, dacheng.tao@sydney.edu.au.
Pseudocode No The paper describes the proposed method step-by-step in prose, but it does not include a structured pseudocode or algorithm block.
Open Source Code No The paper does not provide any statement about releasing source code or a link to a code repository for their proposed method.
Open Datasets Yes We use three widely used semi-supervised graph datasets, Cora, Citeseer and Pubmed summarized in Tab. 1.
Dataset Splits Yes Following the work of [Kipf and Welling, 2016; Veliˇckovi c et al., 2018], for each dataset, we only use 20 nodes per class for training, 500 nodes for validating and 1000 nodes for testing.
Hardware Specification Yes The running time of one epoch with path attention on Pubmed dataset is 0.1s on a Nvidia 1080Ti GPU.
Software Dependencies No We implement SPAGAN under Pytorch framework [Paszke et al., 2017] and train it with Adam optimizer. While specific frameworks/optimizers are mentioned, no specific version numbers for PyTorch or other libraries are provided.
Experiment Setup Yes For the Cora dataset, we set the learning rate to 0.005 and the weight of L2 regularization to 0.0005; for the Pubmed dataset, we set the learning rate to 0.01 and the weight of L2 regularization to 0.001. For the Citeseer dataset... we set the learning rate to 0.0085 and the weight of L2 regularization to 0.002. For all datasets, we set a tolerance window and stop the training process if there is no lower validation loss within it. We use two graph convolutional layers for all datasets with different attention heads. For the first layer, 8 attention heads for each c is used. Each attention head will compute 8 features. Then, an ELU [Clevert et al., 2015] function is applied. In the second layer, we use 8 attention heads for the Pumbed dataset and 1 attention head for the other two datasets. Dropout is applied to the input of each layer and also to the attention coefficients for each node, with a keep probability of 0.4. For all datasets, we set the r to 1.0... The max value of c is set to be three for the first layer and two for the last layer for all datasets. The steps of iteration is set to two. For all the datasets, we use early stopping based on the cross-entropy loss on validation set.