reproducibilityindex.ai

Self-supervised Heterogeneous Graph Pre-training Based on Structural Clustering

Authors: Yaming Yang, Ziyu Guan, Zhe Wang, Wei Zhao, Cai Xu, Weigang Lu, Jianbin Huang

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on four real-world datasets demonstrate the superior effectiveness of SHGP against state-of-the-art unsupervised baselines and even semi-supervised baselines. In this section, we verify the generalization ability of the proposed SHGP by transferring the pre-trained object embeddings to various downstream tasks including object classification, object clustering, and embedding visualization.
Researcher Affiliation	Academia	Yaming Yang, Ziyu Guan, Zhe Wang, Wei Zhao , Cai Xu, Weigang Lu, Jianbin Huang School of Computer Science and Technology, Xidian University {yym@, zyguan@, zwang@stu., ywzhao@mail., cxu@, wglu@stu., jbhuang@}xidian.edu.cn
Pseudocode	Yes	Algorithm 1 The overall procedure of SHGP
Open Source Code	Yes	We release our source code at: https://github.com/kepsail/SHGP.
Open Datasets	Yes	In the experiments, we use four publicly available HIN benchmark datasets, which are widely used in previous related works [38, 32, 18, 23, 33]. Their statistics are summarized in Table 1. Please see Appendix A.1 for more details of these datasets.
Dataset Splits	Yes	On each dataset, for the objects that have ground-truth labels, we randomly select {4%, 6%, 8%} objects as the training set. The others are divided equally as the validation set and the test set.
Hardware Specification	Yes	All the experiments are conducted on an NVIDIA GTX 1080Ti GPU.
Software Dependencies	No	The paper mentions software components like 'Adam optimizer' and 'Xavier uniform distribution' but does not provide specific version numbers for any libraries, frameworks, or programming languages used (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup	Yes	For the proposed SHGP, in all the experiments, we use two HGCN layers as the Att-HGNN encoder, and search the dimensionalities of the hidden layers in the set {64, 128, 256, 512}. All the model parameters are initialized by the Xavier uniform distribution [6], and they are optimized through the Adam optimizer. The learning rate and weight decay are searched from 1e-4 to 1e-2. For the number of warm-up epochs, we search its best value in the set {5, 10, 20, 30, 40, 50}.