Boosting Graph Structure Learning with Dummy Nodes

Authors: Xin Liu, Jiayang Cheng, Yangqiu Song, Xin Jiang

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments are conducted on graph classification and subgraph isomorphism counting and matching, and empirical results reveal the success of learning with graphs with dummy nodes.
Researcher Affiliation Collaboration 1Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong SAR, China 2Huawei Noah s Ark Lab, Hong Kong SAR, China.
Pseudocode Yes Algorithm 1 Edge-to-vertex transform LΦ
Open Source Code Yes Code is publicly released at https://github.com/ HKUST-Know Comp/Dummy Node4Graph Learning.
Open Datasets Yes Datasets. We select four benchmarking datasets where current state-of-the-art models face overfitting problems: PROTEINS (Borgwardt et al., 2005), D&D (Dobson & Doig, 2003), NCI109, and NCI1 (Wale et al., 2008).
Dataset Splits Yes Following previous work (Zhang et al., 2019), we randomly split each dataset into the training set (80%), the validation set (10%), and the test set (10%) in each run.
Hardware Specification Yes We conduct our experiments on one Cent OS 7 server with 2 Intel Xeon Gold 5215 CPUs and 4 NVIDIA Ge Force RTX 3090 GPUs.
Software Dependencies Yes The software versions are: GNU C++ Compiler 5.2.0, Python 3.7.3, Py Torch 1.7.1, torch-geometric 2.0.2, and DGL 0.6.0.
Experiment Setup Yes We use the Adam optimizer (Kingma & Ba, 2015) to optimize the models. Following Zhang et al. (2019), an early stopping strategy with patience 100 is adopted during training, i.e., training would be stopped when the loss on validation set does not decrease for over 100 epochs. For Graph SAGE, GIN and Diff Pool, the optimal hyper-parameters are found using grid search within the same search ranges in (Errica et al., 2020). For HGP-SL, we follow the official hyper-parameters reported in their Git Hub repository. For models using graph convolutional operator (Kipf & Welling, 2017) (GCN, Diff Pool, HGP-SL), we additionally impose a learnable weight γ on the dummy edges. The weights for all the other edges is set as 1, and γ is initialized with different values in {0.01, 0.1, 1, 10}. For relational models RGCN and RGIN, we search for the learning rate within {1e 2, 1e 3, 1e 4}, batch size {128, 512}, hidden dimension {32, 64}, dropout ratio {0, 0.5}, and number of layers {2, 4}.