reproducibilityindex.ai

Topology-Imbalance Learning for Semi-Supervised Node Classification

Authors: Deli Chen, Yankai Lin, Guangxiang Zhao, Xuancheng Ren, Peng Li, Jie Zhou, Xu Sun

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Systematic experiments demonstrate the effectiveness and generalizability of our method in relieving topology-imbalance issue and promoting semi-supervised node classiﬁcation. The further analysis unveils varied sensitivity of different graph neural networks (GNNs) to topology imbalance, which may serve as a new perspective in evaluating GNN architectures.1 1 Introduction...3 Experiments In this section, we will ﬁrst introduce the experimental datasets for both transductive and inductive semi-supervised node classiﬁcation. Then we introduce the experiments to verify the effectiveness of the proposed Re Node method in three different imbalance situations: (1) TINL only, (2) TINL and QINL, (3) Large-scale Graph. 3.1 Datasets...Table 1: Re Node (short as RN) for the pure topology-imbalance issue. We report Weighted-F1 (W-F, %) Macro-F1 (M-F, %) and the corresponding standard deviation for each group of experiments. and represent the result is signiﬁcant in student t-test with p < 0.05 and p < 0.01, respectively.
Researcher Affiliation	Collaboration	Deli Chen1,2, Yankai Lin1, Guangxiang Zhao2, Xuancheng Ren2, Peng Li1, Jie Zhou1, Xu Sun2 1Pattern Recognition Center, We Chat AI, Tencent Inc., China 2MOE Key Lab of Computational Linguistics, School of EECS, Peking University
Pseudocode	No	No pseudocode or algorithm blocks were found in the paper.
Open Source Code	Yes	1The code is available at https://github.com/victorchen96/Re Node.
Open Datasets	Yes	For the transductive setting [13], we take the widely-used Plantoid paper citation graphs [33] (CORA,Cite Seer, Pubmed) and the Amazon copurchase graphs [24] (Photo,Computers) to verify the effectiveness of our method. For the inductive setting, we conduct experiments on the popular Reddit dataset [13] and the enormous MAG-Scholar dataset (coarse-grain version) [2] which owns millions of nodes and features.
Dataset Splits	Yes	Following the most widely-used semisupervised setting in node classiﬁcation studies [47, 18], we randomly select 20 nodes in each class for training and 30 nodes per class for validation; all the remaining nodes form the test set.
Hardware Specification	No	No specific hardware (e.g., GPU/CPU models, memory specifications) used for running experiments is explicitly mentioned in the paper.
Software Dependencies	No	No specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) are explicitly mentioned in the paper.
Experiment Setup	Yes	The training loss LT from section 2.4 is adopted... Appendix A: All GNNs are with 2 layers, 16 hidden units, 0.5 dropout, 5e-4 weight decay, 0.01 learning rate, and 200 training epochs unless otherwise stated. We set the α in Equation 1 as 0.2, the α in Equation 2 as 0.2. For Re Node related hyper-parameters, we set wmin=0.1, wmax=1.0. For the quantity-imbalance related experiments, the hyperparameters for the GCN model (Focal Loss, RW, CB) are set as their original papers or we apply grid search to ﬁnd the best parameters. For DR-GCN, RA-GCN, G-SMOTE, the hyper-parameters are kept as their original papers.