Learning Large Graph Property Prediction via Graph Segment Training

Authors: Kaidi Cao, Mangpo Phothilimthana, Sami Abu-El-Haija, Dustin Zelle, Yanqi Zhou, Charith Mendis, Jure Leskovec, Bryan Perozzi

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our complete method GST+EFD (with all the techniques together) on two large graph property prediction benchmarks: Mal Net and Tpu Graphs. Our experiments show that GST+EFD is both memory-efficient and fast, while offering a slight boost on test accuracy over a typical full graph training regime. We evaluate our method on the following datasets: Mal Net-Tiny, Mal Net-Large and Tpu Graphs.
Researcher Affiliation Collaboration Kaidi Cao1 , Phitchaya Mangpo Phothilimthana2, Sami Abu-El-Haija2, Dustin Zelle2, Yanqi Zhou2, Charith Mendis3 , Jure Leskovec1, Bryan Perozzi2 1Stanford University, 2Google, 3UIUC
Pseudocode Yes Algorithm 1 General Framework of GST and Algorithm 2 Pipeline of GST+EFD
Open Source Code Yes Source code available at https://github.com/kaidic/GST.
Open Datasets Yes We evaluate our method on the following datasets: Mal Net-Tiny, Mal Net-Large and Tpu Graphs. Our experiments show that GST+EFD is both memory-efficient and fast, while offering a slight boost on test accuracy over a typical full graph training regime. Mal Net [10] is a large-scale graph representation learning dataset... Tpu Graphs is an internal large scale graph regression dataset, whose goal is to predict an execution time of an XLA’s HLO graph with a specific compiler configuration on a Tensor Processing Unit (TPU). We have made public a dataset [24] that closely parallels our internal dataset.
Dataset Splits No The paper states the total number of graphs in the datasets, e.g., "Mal Net-Tiny, containing 5,000 graphs" and "Mal Net-Large also contains 5,000 graphs" and "Tpu Graphs contains 5,153 HLO graphs". It provides 'test accuracy' and 'train/test OPA' results, implying standard splits were used, but does not explicitly provide specific percentages, sample counts, or refer to a definitive split methodology (e.g., standard 80/10/10 split or a cited predefined split) for training, validation, and testing sets.
Hardware Specification Yes We conduct all the experiments on Mal Net with a single NVIDIA-V100 GPU with 16GB of memory, and four NVIDIA-V100 GPUs (for data parallelism) with 16GB of memory for Tpu Graphs.
Software Dependencies No The paper states: 'Our code is implemented in Py Torch [22].' However, it does not provide specific version numbers for PyTorch or any other software dependencies, such as Python or other libraries.
Experiment Setup Yes We use Adam optimizer [18] with the base learning rate of 0.01 for GCN and SAGE. For Graph GPS, we use Adam W optimizer [20] with the cosine scheduler and the base learning rate of 0.0005. We train for 600 epochs until convergence. For Prediction Head Finetuning, we finetune for another 100 epochs. We limit the maximum segment size to 5,000 nodes, and use a keep probability p = 0.5 if not otherwise specified. We train with Cross Entropy loss. We use Adam optimizer with the base learning rate of 0.0001. We train for 200,000 iterations until convergence. We by default limit the maximum segment size to 8,192 nodes, and use a keep probability p = 0.5 if not otherwise specified. Since we care more about relative ranking than the absolute runtime, we use Pairwise Hinge loss within a batch during training.