Uncovering the Redundancy in Graph Self-supervised Learning Models
Authors: Zhibiao Wang, Xiao Wang, Haoyue Deng, Nian Liu, Shirui Pan, Chunming Hu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, by presenting the experimental evidence and analysis, we surprisingly discover that the graph self-supervised learning models are highly redundant at both of neuron and layer levels, e.g., even randomly removing 51.6% of parameters, the performance of graph self-supervised learning models still retains at least 96.2%. and We conduct experiments and surprisingly find out that the graph self-supervised learning models are actually highly redundant. |
| Researcher Affiliation | Academia | Zhibiao Wang1, Xiao Wang1 , Haoyue Deng1, Nian Liu2, Shirui Pan3, Chunming Hu1 1 Beihang University, China 2 Beijing University of Posts and Telecommunications, China 3 Griffith University, Australia |
| Pseudocode | No | The paper provides a conceptual diagram of the SLIDE framework (Figure 4) and describes its components and optimization process mathematically, but it does not include structured pseudocode or an algorithm block. |
| Open Source Code | Yes | Code available at https://github.com/zhlgg/SLIDE |
| Open Datasets | Yes | For a comprehensive comparison, we use six real-world datasets to evaluate the performance of node classification (i.e., Cora, Citeseer, Pubmed, Amazon-Photo, Amazon-Computers and Ogbn-arxiv). and The paper cites [16] for Amazon datasets, [17] for Cora, Citeseer, Pubmed, and [18] for Ogbn-arxiv, indicating publicly available and well-established datasets. |
| Dataset Splits | Yes | Table 10: Dataset Statistics Datasets # Nodes # Edges # Features # Classes Split ratio Cora 2,708 10,556 1,433 7 140/500/1,000 Citeseer 3,327 9,104 3,703 6 120/500/1,000 Pubmed 19,717 88,648 500 3 60/500/1,000 Photo 7,650 238,162 745 8 10%/10%/80% Computers 13,752 491,722 767 10 10%/10%/80% ar Xiv 16,9343 2,315,598 128 40 90,941/29,799/48,603 |
| Hardware Specification | Yes | All experiments are conducted on Linux servers equipped with NVIDIA RTX A5000 GPUs (22729 MB). |
| Software Dependencies | No | The paper mentions using Adam optimizer and provides links to the official code implementations for Graph MAE, GRACE, and Mask GAE. However, it does not specify exact version numbers for any software libraries, frameworks, or programming languages used. |
| Experiment Setup | Yes | For Graph MAE, we obtain the hyper-parameters of pre-training on Amazon-Photo and Amazon Computers by ourselves. For both datasets, linear probes are trained using Adam with a learning rate of 0.01, momentum of 0.9 and weight decay of 0.0005 while GNNs are pre-trained with a learning rate of 0.001, weight decay of 0, hidden number of 1024, head number of 4, layer number of 2, mask rate of 0.5, drop edge rate of 0.5 and epoch number of 1000. and Appendix C.3 provides extensive hyper-parameter details for full fine-tuning, GRACE, Mask GAE, and SLIDE across various datasets. |