Uncovering the Redundancy in Graph Self-supervised Learning Models

Authors: Zhibiao Wang, Xiao Wang, Haoyue Deng, Nian Liu, Shirui Pan, Chunming Hu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, by presenting the experimental evidence and analysis, we surprisingly discover that the graph self-supervised learning models are highly redundant at both of neuron and layer levels, e.g., even randomly removing 51.6% of parameters, the performance of graph self-supervised learning models still retains at least 96.2%. and We conduct experiments and surprisingly find out that the graph self-supervised learning models are actually highly redundant.
Researcher Affiliation Academia Zhibiao Wang1, Xiao Wang1 , Haoyue Deng1, Nian Liu2, Shirui Pan3, Chunming Hu1 1 Beihang University, China 2 Beijing University of Posts and Telecommunications, China 3 Griffith University, Australia
Pseudocode No The paper provides a conceptual diagram of the SLIDE framework (Figure 4) and describes its components and optimization process mathematically, but it does not include structured pseudocode or an algorithm block.
Open Source Code Yes Code available at https://github.com/zhlgg/SLIDE
Open Datasets Yes For a comprehensive comparison, we use six real-world datasets to evaluate the performance of node classification (i.e., Cora, Citeseer, Pubmed, Amazon-Photo, Amazon-Computers and Ogbn-arxiv). and The paper cites [16] for Amazon datasets, [17] for Cora, Citeseer, Pubmed, and [18] for Ogbn-arxiv, indicating publicly available and well-established datasets.
Dataset Splits Yes Table 10: Dataset Statistics Datasets # Nodes # Edges # Features # Classes Split ratio Cora 2,708 10,556 1,433 7 140/500/1,000 Citeseer 3,327 9,104 3,703 6 120/500/1,000 Pubmed 19,717 88,648 500 3 60/500/1,000 Photo 7,650 238,162 745 8 10%/10%/80% Computers 13,752 491,722 767 10 10%/10%/80% ar Xiv 16,9343 2,315,598 128 40 90,941/29,799/48,603
Hardware Specification Yes All experiments are conducted on Linux servers equipped with NVIDIA RTX A5000 GPUs (22729 MB).
Software Dependencies No The paper mentions using Adam optimizer and provides links to the official code implementations for Graph MAE, GRACE, and Mask GAE. However, it does not specify exact version numbers for any software libraries, frameworks, or programming languages used.
Experiment Setup Yes For Graph MAE, we obtain the hyper-parameters of pre-training on Amazon-Photo and Amazon Computers by ourselves. For both datasets, linear probes are trained using Adam with a learning rate of 0.01, momentum of 0.9 and weight decay of 0.0005 while GNNs are pre-trained with a learning rate of 0.001, weight decay of 0, hidden number of 1024, head number of 4, layer number of 2, mask rate of 0.5, drop edge rate of 0.5 and epoch number of 1000. and Appendix C.3 provides extensive hyper-parameter details for full fine-tuning, GRACE, Mask GAE, and SLIDE across various datasets.