FUG: Feature-Universal Graph Contrastive Pre-training for Graphs with Diverse Node Features

Authors: Jitao Zhao, Di Jin, Meng Ge, Lianze Shan, Xin Wang, Dongxiao He, Zhiyong Feng

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conducted extensive experiments to validate FUG s efficacy. In in-domain self-supervised learning experiments, FUG demonstrated competitive performance with existing advanced graph self-supervised models. In the cross-domain learning experiments, the FUG trained and tested on different datasets shows similar performance to the model trained and tested on the same dataset.
Researcher Affiliation Academia Jitao Zhao1, Di Jin1 , Meng Ge2, Lianze Shan1, Xin Wang1, Dongxiao He1, Zhiyong Feng1 1College of Intelligence and Computing, Tianjin University, China 2Department of Electrical and Computer Engineering, National University of Singapore, Singapore
Pseudocode Yes To further illustrate how FUG works, we provide pseudocode as shown in Algorithm 1
Open Source Code Yes The source code is available at: https://github.com/hedongxiao-tju/FUG.
Open Datasets Yes We follow many prior works [5, 42, 32, 33] and evaluated our performance on seven widely used public datasets: Cora, Cite Seer, Pub Med [11, 12], Photo, Computers [38], CS, and Physics [39].
Dataset Splits Yes For the node classification experiments, to simulate a few-shot scenario, we split the dataset into train, valid, and test sets by 10%/10%/80%.
Hardware Specification Yes All experiments were conducted on a device equipped with an Intel 12400 CPU and an NVIDIA RTX 3090 GPU.
Software Dependencies No The paper mentions using 'Python library torch_geometric.dataset' and 'Py Torch' but does not specify their version numbers or the versions of any other key software dependencies.
Experiment Setup Yes For FUG in all scenarios, we use a two-layer GCN as the graph encoder. We chose Adam as the Optimizer, the learning rate is set to 0.00001, the weight decay is set to 0.00001, and PRe LU is selected as the activation function. The dimension encoder is a MLP, Linear(PRe LU(Linear( ))). The number of nodes sampled and the dimension of the basis transformation vector are both 1024. In addition, in order to reduce the fluctuation caused by random, we directly select the first 1024 nodes as the sampling nodes for all datasets. The random seed is fixed to 66666 in all scenarios.