Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Geometric Imbalance in Semi-Supervised Node Classification

Authors: Liang Yan, Shengzhong Zhang, Bisheng Li, Menglin Yang, Chen Yang, Min Zhou, Weiyang Ding, Yutong Xie, zengfeng Huang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on diverse benchmarks show that our approach consistently outperforms existing methods, especially under severe class imbalance.
Researcher Affiliation	Collaboration	1Fudan University 2Shanghai Innovation Institute 3MBZUAI 4Hong Kong University of Science and Technology (Guangzhou) 5Logs AI
Pseudocode	Yes	Algorithm 1 Our Algorithm
Open Source Code	Yes	Code: https://github.com/yanliang3612/UNREAL
Open Datasets	Yes	We conduct evaluations under various benchmarking settings on 8 datasets Cora [68], Citeseer [68], Pubmed [68], Amazon-Computers [43], Computers-Random [67], CS-Random [67], Flickr [69], and Ogbn-arxiv [17]
Dataset Splits	Yes	For the citation networks (Cora, Citeseer, and Pubmed), we use the standard splits from [68] to create imbalance settings with ρ = 10 and ρ = 20. For more extreme imbalances (ρ = 50 and 100), which require more labeled nodes per class, we adopt random splits. For Amazon-Computers, we generate splits with varying degrees of class imbalance (ρ = 10, 20, 50, 100) based on the procedure in [67]. For Flickr and Ogbn-arxiv, we adopt their publicly available splits, as the settings are inherently highly imbalanced. Appendix K details our experimental framework, including label distributions, evaluation protocols, and algorithm implementations.
Hardware Specification	Yes	Experiments are conducted on a server with an NVIDIA 3090 GPU (24 GB memory) and an Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz.
Software Dependencies	No	All the algorithms and models are implemented in Python and Py Torch Geometric.
Experiment Setup	Yes	We adopt Adam [24] optimizer with an initial learning rate of 0.01 or 0.005. We follow [47] to devise a scheduler, which cuts the learning rate by half if there is no decrease in validation loss for 100 consecutive epochs. All learnable parameters in the model adopt weight decay with a rate of 0.0005. For the first training iteration, we train the model for 200 epochs using the original training set for Cora, Cite Seer, Pub Med, or Amazon-Computers. For Flickr, we train for 2000 epochs in the first iteration. We train models for 2000 epochs in the rest of the iteration with the above optimizer and scheduler. The best models are selected based on validation accuracy. Early stopping is used with patience set to 300.