Rethinking Semi-Supervised Imbalanced Node Classification from Bias-Variance Decomposition

Authors: Divin Yan, Gengchen Wei, Chen Yang, Shengzhong Zhang, zengfeng Huang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Exhaustive tests are conducted on multiple benchmarks, including naturally imbalanced datasets and public-split class-imbalanced datasets, demonstrating that our approach outperforms state-of-the-art methods in various imbalanced scenarios.
Researcher Affiliation Academia Fudan University, {yanl21, gcwei22, yanc22}@m.fudan.edu.cn {szzhang17, huangzf}@fudan.edu.cn
Pseudocode Yes Algorithm 1 Re Var
Open Source Code Yes The model implementation and data is released at https://github.com/yanliang3612/Re Var.
Open Datasets Yes We have demonstrated the efficacy of our method on five commonly used benchmark datasets across various imbalance scenarios. For the conventional setting (ρ=10) of imbalanced node classification in [50, 24, 31], we conducted experiments on Cora, Cite Seer, Pubmed, and Amazon-Computers.
Dataset Splits Yes We create a random validation set that contains 30 nodes in each class, and the remaining nodes are used as the testing set.
Hardware Specification Yes Experiments are conducted on a server with an NVIDIA 3090 GPU (24 GB memory) and an Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz.
Software Dependencies No All the algorithms and models are implemented in Python and Py Torch Geometric.
Experiment Setup Yes We utilized the Adam optimizer [15] with an initial learning rate of 0.01 or 0.005. To manage the learning rate, we employed a scheduler based on the approach outlined in [31], which reduced the learning rate by half when there was no decrease in validation loss for 100 consecutive epochs. Weight decay with a rate of 0.0005 was applied to all learnable parameters in the model. In the initial training iteration, we trained the model for 200 epochs using the original training set for Cora, Cite Seer, Pub Med, or Amazon-Computers. However, for Flickr, the training was extended to 2000 epochs in the first iteration. Subsequently, in the remaining iterations, we trained the models for 2000 epochs using the aforementioned optimizer and scheduler. The best models were selected based on validation accuracy, and we employed early stopping with a patience of 300 epochs.