Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Self-supervised Heterogeneous Graph Pre-training Based on Structural Clustering
Authors: Yaming Yang, Ziyu Guan, Zhe Wang, Wei Zhao, Cai Xu, Weigang Lu, Jianbin Huang
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on four real-world datasets demonstrate the superior effectiveness of SHGP against state-of-the-art unsupervised baselines and even semi-supervised baselines. In this section, we verify the generalization ability of the proposed SHGP by transferring the pre-trained object embeddings to various downstream tasks including object classification, object clustering, and embedding visualization. |
| Researcher Affiliation | Academia | Yaming Yang, Ziyu Guan, Zhe Wang, Wei Zhao , Cai Xu, Weigang Lu, Jianbin Huang School of Computer Science and Technology, Xidian University {yym@, zyguan@, zwang@stu., ywzhao@mail., cxu@, wglu@stu., jbhuang@}xidian.edu.cn |
| Pseudocode | Yes | Algorithm 1 The overall procedure of SHGP |
| Open Source Code | Yes | We release our source code at: https://github.com/kepsail/SHGP. |
| Open Datasets | Yes | In the experiments, we use four publicly available HIN benchmark datasets, which are widely used in previous related works [38, 32, 18, 23, 33]. Their statistics are summarized in Table 1. Please see Appendix A.1 for more details of these datasets. |
| Dataset Splits | Yes | On each dataset, for the objects that have ground-truth labels, we randomly select {4%, 6%, 8%} objects as the training set. The others are divided equally as the validation set and the test set. |
| Hardware Specification | Yes | All the experiments are conducted on an NVIDIA GTX 1080Ti GPU. |
| Software Dependencies | No | The paper mentions software components like 'Adam optimizer' and 'Xavier uniform distribution' but does not provide specific version numbers for any libraries, frameworks, or programming languages used (e.g., Python, PyTorch/TensorFlow versions). |
| Experiment Setup | Yes | For the proposed SHGP, in all the experiments, we use two HGCN layers as the Att-HGNN encoder, and search the dimensionalities of the hidden layers in the set {64, 128, 256, 512}. All the model parameters are initialized by the Xavier uniform distribution [6], and they are optimized through the Adam optimizer. The learning rate and weight decay are searched from 1e-4 to 1e-2. For the number of warm-up epochs, we search its best value in the set {5, 10, 20, 30, 40, 50}. |