Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Preference-driven Knowledge Distillation for Few-shot Node Classification

Authors: Xing Wei, Chunchun Chen, Rui Fan, Xiaofeng Cao, Sourav Medya, Wei Ye

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments validate the efficacy of our proposed framework in few-shot node classification on real-world TAGs.
Researcher Affiliation	Academia	1 College of Electronic and Information Engineering, Tongji University, China 2 Shanghai Research Institute for Intelligent Autonomous Systems, Tongji University, China 3 National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, Xi an Jiaotong University, China 4 School of Computer Science and Technology, Tongji University, China 5 Department of Computer Science, University of Illinois Chicago, USA EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1: The training of PKD. Input: GT = (V, E, X, A, T), training dataset with true labels DL, teacher GNNs {Tb}4 b=1 with parameters {f θ Tb}4 b=1, student GNN S with parameter f θ S, fine-tuned LLM LLM θ, Policy Model f θ A, Value Model f ϕ V , epoch number of RL L1 Output: The expanded training dataset DL, optimized parameters LLM θ , f θ S , f θ A , f ϕ V and predicted labels y.
Open Source Code	Yes	Our code is available at https://github.com/GEEX-Weixing/PKD.
Open Datasets	Yes	Datasets In order to assess the few-shot node classification performance of our method on TAGs, we conduct a comprehensive series of experiments across 9 real-world datasets: CORNELL, WASHINGTON, TEXAS, WISCONSIN [25], AMAZON RATINGS [42], OGBN-ARXIV [43], WIKI CS [44], PUBMED, CORA [45]. They have various 1-hop homophily ratios [46] and additional details of the datasets can be found in Appendix A.
Dataset Splits	Yes	For the KD-baselines, we partition the nodes of each graph into training, validation, and test sets, allocating 48%, 32%, and 20%, respectively, based on the proportion division mentioned in [47]. For PKD and other baselines, we randomly select 1, 3, and 5 labeled nodes per class as the initial training data and then expand the dataset to 48% of the total using the GNS module. The remaining data is randomly split into 32% for validation and 20% for testing, with the preserved indices for the baselines. This operation is repeated 5 times.
Hardware Specification	Yes	We conduct all experiments on the NVIDIA A800-SXM4-80GB GPU and Intel(R) Xeon(R) CPU Max 9468.
Software Dependencies	Yes	Specifically, We implement our proposed PKD with Py Torch (2.5.1) [59], Py Torch Geometric (2.6.1) [60], Python (3.10.16), Transformers (4.50.3), and vllm (0.7.0).
Experiment Setup	Yes	Specifically, the following hyper-parameter values are utilized: the hidden dimension is set to 128. We use ReLU activation functions in all our baseline models. The Adam optimizer is utilized with a learning rate of 1e-2 and weight decay of 5e-4. We train each baseline for 600 steps and select the best step based on the validation accuracy. In our proposed method, we train the student 5 epochs after GNN selection driven by node attributes every time and train the agent 200 epochs. The other weight hyper-parameters are set as follows: α = 0.5, β = 1, γ = 0.1, η = 0.3, c1 = 0.5, c2 = 0.01, ϵ = 0.2.