Graph Transformer Networks

Authors: Seongjun Yun, Minbyul Jeong, Raehyun Kim, Jaewoo Kang, Hyunwoo J. Kim

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we evaluate the benefits of our method against a variety of state-of-the-art models on node classification. We conduct experiments and analysis to answer the following research questions: Q1. Are the new graph structures generated by GTN effective for learning node representation? Q2. Can GTN adaptively produce a variable length of meta-paths depending on datasets? Q3. How can we interpret the importance of each meta-path from the adjacency matrix generated by GTNs?
Researcher Affiliation Academia Seongjun Yun, Minbyul Jeong, Raehyun Kim, Jaewoo Kang , Hyunwoo J. Kim Department of Computer Science and Engineering Korea University {ysj5419, minbyuljeong, raehyun, kangj, hyunwoojkim}@korea.ac.kr
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is publicly available at https://github.com/ seongjunyun/Graph_Transformer_Networks.
Open Datasets Yes Datasets. To evaluate the effectiveness of meta-paths generated by Graph Transformer Networks, we used heterogeneous graph datasets that have multiple types of nodes and edges. The main task is node classification. We use two citation network datasets DBLP and ACM, and a movie dataset IMDB. The statistics of the heterogeneous graphs used in our experiments are shown in Table 1.
Dataset Splits Yes Table 1: Datasets for node classification on heterogeneous graphs. Dataset # Nodes # Edges # Edge type # Features # Training # Validation # Test DBLP 18405 67946 4 334 800 400 2857 ACM 8994 25922 4 1902 600 300 2125 IMDB 12772 37288 4 1256 300 300 2339
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments.
Software Dependencies No The paper mentions "The Adam optimizer was used" but does not provide specific software dependencies with version numbers.
Experiment Setup Yes We set the embedding dimension to 64 for all the above methods for a fair comparison. The Adam optimizer was used and the hyperparameters (e.g., learning rate, weight decay etc.) are respectively chosen so that each baseline yields its best performance. For random walk based models, a walk length is set to 100 per node for 1000 iterations and the window size is set to 5 with 7 negative samples. For GCN, GAT, and HAN, the parameters are optimized using the validation set, respectively. For our model GTN, we used three GT layers for DBLP and IMDB datasets, two GT layers for ACM dataset.