Topology-Imbalance Learning for Semi-Supervised Node Classification
Authors: Deli Chen, Yankai Lin, Guangxiang Zhao, Xuancheng Ren, Peng Li, Jie Zhou, Xu Sun
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Systematic experiments demonstrate the effectiveness and generalizability of our method in relieving topology-imbalance issue and promoting semi-supervised node classification. The further analysis unveils varied sensitivity of different graph neural networks (GNNs) to topology imbalance, which may serve as a new perspective in evaluating GNN architectures.1 1 Introduction...3 Experiments In this section, we will first introduce the experimental datasets for both transductive and inductive semi-supervised node classification. Then we introduce the experiments to verify the effectiveness of the proposed Re Node method in three different imbalance situations: (1) TINL only, (2) TINL and QINL, (3) Large-scale Graph. 3.1 Datasets...Table 1: Re Node (short as RN) for the pure topology-imbalance issue. We report Weighted-F1 (W-F, %) Macro-F1 (M-F, %) and the corresponding standard deviation for each group of experiments. and represent the result is significant in student t-test with p < 0.05 and p < 0.01, respectively. |
| Researcher Affiliation | Collaboration | Deli Chen1,2, Yankai Lin1, Guangxiang Zhao2, Xuancheng Ren2, Peng Li1, Jie Zhou1, Xu Sun2 1Pattern Recognition Center, We Chat AI, Tencent Inc., China 2MOE Key Lab of Computational Linguistics, School of EECS, Peking University |
| Pseudocode | No | No pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | Yes | 1The code is available at https://github.com/victorchen96/Re Node. |
| Open Datasets | Yes | For the transductive setting [13], we take the widely-used Plantoid paper citation graphs [33] (CORA,Cite Seer, Pubmed) and the Amazon copurchase graphs [24] (Photo,Computers) to verify the effectiveness of our method. For the inductive setting, we conduct experiments on the popular Reddit dataset [13] and the enormous MAG-Scholar dataset (coarse-grain version) [2] which owns millions of nodes and features. |
| Dataset Splits | Yes | Following the most widely-used semisupervised setting in node classification studies [47, 18], we randomly select 20 nodes in each class for training and 30 nodes per class for validation; all the remaining nodes form the test set. |
| Hardware Specification | No | No specific hardware (e.g., GPU/CPU models, memory specifications) used for running experiments is explicitly mentioned in the paper. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) are explicitly mentioned in the paper. |
| Experiment Setup | Yes | The training loss LT from section 2.4 is adopted... Appendix A: All GNNs are with 2 layers, 16 hidden units, 0.5 dropout, 5e-4 weight decay, 0.01 learning rate, and 200 training epochs unless otherwise stated. We set the α in Equation 1 as 0.2, the α in Equation 2 as 0.2. For Re Node related hyper-parameters, we set wmin=0.1, wmax=1.0. For the quantity-imbalance related experiments, the hyperparameters for the GCN model (Focal Loss, RW, CB) are set as their original papers or we apply grid search to find the best parameters. For DR-GCN, RA-GCN, G-SMOTE, the hyper-parameters are kept as their original papers. |