Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Differentially Private Relational Learning with Entity-level Privacy Guarantees

Authors: Yinan Huang, Haoteng Yin, Eli Chien, Rongzhe Wei, Pan Li

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on fine-tuning text encoders over text-attributed network-structured relational data demonstrate the strong utility-privacy trade-offs of our approach. Our code is available at https://github.com/Graph-COM/Node_DP. In this section, we empirically evaluate the privacy and utility characteristics of our proposed method. First, we numerically compute and compare the privacy bounds (Eq.(3)). Second, we consider an application of finetuning text encoders on relational data to evaluate the privacy-utility trade-offs.
Researcher Affiliation	Academia	Yinan Huang Georgia Institute of Technology EMAIL Haoteng Yin Purdue University EMAIL Eli Chien Georgia Institute of Technology EMAIL Rongzhe Wei Georgia Institute of Technology EMAIL Pan Li Georgia Institute of Technology EMAIL
Pseudocode	Yes	Algorithm 1 Frequency-based Adaptive Gradient Clipping (FREQ-CLIP) Algorithm 2 Negative Sampling Without Replacement (NEG-SAMPLE-WOR) Algorithm 3 Relational Learning with Entity-level Differential Privacy
Open Source Code	Yes	Our code is available at https://github.com/Graph-COM/Node_DP.
Open Datasets	Yes	We adopt two pre-trained language models BERT [76] and Llama2 [77] as the text encoders, and four text-attributed graphs from two subdomain pairs: two citation networks (MAG-CHN, MAG-USA) [78] and two co-purchase networks (AMZA-Sports, AMZA-Cloth) [79].
Dataset Splits	Yes	The models are first privately fine-tuned on one network (e.g., MAG-CHN), and then its utility is evaluated on another same-domain network (e.g., MAG-USA). We report the overall privacy loss in terms of (ε, δ)-DP, where δ is set to 1/\|Etrain\|, the size of the relation set used for training after degree capping. Table 2: Dataset statistics and experimental setup for evaluation. Dataset #Entity #Relation #Entity (Test) #Classes #Relation (Test) Test Domain
Hardware Specification	Yes	We use a server with two AMD EPYC 7543 CPUs, 512GB DRAM, and NVIDIA Quadro RTX 6000 (24GB) GPUs for experiments of BERT-based models and A100 (80GB) GPUs for Llama2-7B models.
Software Dependencies	Yes	The codebase is built on Py Torch 2.1.2, Transformers 4.23.0, PEFT 0.10.0, and Opacus 1.4.1.
Experiment Setup	Yes	Each LLM is fine-tuned through the proposed Algorithm 3 using the Info NCE loss. The overall privacy loss is tracked through Theorem 4.1 and converted to (ϵ, δ)-DP, where δ is set to 1/\|Etrain\|, the size of the relation set used for training after degree capping. Note that we actually adopt Adam optimizer to update model parameters in Algorithm 3, which has the same privacy guarantee of SGD-style update in the original form, due to the post-processing property of DP [82, 26]. The maximal node degree is capped by K = 5 according to Algorithm 3. Parameters. We set the parameters based on the real-world graph statistics (Appendix E, Table 2) we will use in the latter experiments. By default: number of nodes n = 106, number of edges m = 5 106, capped node degree K = 5, sampling rate γ = 10-5, Gaussian noise levels σ = 0.5 and number of negative edges per positive edge kneg = 4. The clipping threshold C is set to 1.