reproducibilityindex.ai

Hierarchical Graph Transformer with Adaptive Node Sampling

Authors: ZAIXI ZHANG, Qi Liu, Qingyong Hu, Chee-Kong Lee

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we conduct extensive experiments on real-world datasets to demonstrate the superiority of our method over existing graph transformers and popular GNNs.
Researcher Affiliation	Collaboration	Zaixi Zhang1,2 Qi Liu1,2 , Qingyong Hu3, Chee-Kong Lee4 1: Anhui Province Key Lab of Big Data Analysis and Application, School of Computer Science and Technology, University of Science and Technology of China 2:State Key Laboratory of Cognitive Intelligence, Hefei, Anhui, China 3:Hong Kong University of Science and Technology, 4: Tencent America
Pseudocode	Yes	Algorithm 1 ANS-GT Input: Total training epochs E; pmin; update period T; the number of sampled nodes N. Output: Trained Graph Transformer model, optimized wt.
Open Source Code	No	1. Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] The code will be released once the paper is accepted.
Open Datasets	Yes	To comprehensively evaluate the effectiveness of ANS-GT, we conduct experiments on the six benchmark datasets including citation graphs Cora, Cite Seer, and Pub Med [18]; Wikipedia graphs Chameleon, Squirrel; the Actor co-occurrence graph [5]; and Web KB datasets [28] including Cornell, Texas, and Wisconsin.
Dataset Splits	Yes	We set the train-validation-test split as 60%/20%/20%.
Hardware Specification	Yes	All models were trained on one NVIDIA Tesla V100 GPU.
Software Dependencies	No	The paper mentions 'Adam W as the optimizer' but does not specify version numbers for any software dependencies like programming languages, libraries, or frameworks.
Experiment Setup	Yes	Implementation Details. We adopt Adam W as the optimizer and set the hyper-parameter ϵ to 1e-8 and (β1, β2) to (0.99,0.999). The peak learning rate is set to 2e-4 with a 100 epochs warm-up stage followed by a linear decay learning rate scheduler. We adopt the Variational Neighborhoods [26] with a coarsening rate of 0.01 as the default coarsening method... Parameter Settings. In the default setting, the dropout rate is set to 0.5, the end learning rate is set to 1e-9, the hidden dimension d is set to 128, the number of training epochs is set to 1,000, update period T is set to 100, N is set to 20, M is set to 10, and the number of attention head H is set as 8. We tune other hyper-parameters on each dataset based on by grid search. The searching space of batch size, number of data augmentation S, the number of layers L, number of sampled nodes, number of sampled super-nodes, number global nodes are {8, 16, 32}, {4, 8, 16, 32}, {2, 3, 4, 5, 6}, {10, 15, 20, 25}, {0, 3, 6, 9}, {1, 2, 3} respectively.