Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Unifying Re-Identification, Attribute Inference, and Data Reconstruction Risks in Differential Privacy

Authors: Bogdan Kulynych, Juan Felipe Gomez, Georgios Kaissis, Jamie Hayes, Borja Balle, Flavio Calmon, Jean Raisaro

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, our results are tighter than prior methods using ε-DP, R enyi DP, and concentrated DP. As a result, calibrating noise using our bounds can reduce the required noise by 20% at the same risk level, which yields, e.g., an accuracy increase from 52% to 70% in a text classification task. Overall, this unifying perspective provides a principled framework for interpreting and calibrating the degree of protection in DP against specific levels of re-identification, attribute inference, or data reconstruction risk.
Researcher Affiliation	Collaboration	Bogdan Kulynych Lausanne University Hospital Juan Felipe Gomez Harvard University Georgios Kaissis Google Deep Mind Jamie Hayes Google Deep Mind Borja Balle Google Deep Mind Flavio P. Calmon Harvard University Jean Louis Raisaro Lausanne University Hospital University of Lausanne
Pseudocode	No	The paper describes DP-SGD as an algorithm in text (e.g., "complex algorithms such as DP-SGD (Abadi et al., 2016)"), but does not contain a specific pseudocode or algorithm block within its content.
Open Source Code	Yes	We release the code as part of the Python package: https://github.com/Felipe-Gomez/riskcal
Open Datasets	Yes	language model for text sentiment classification on the SST-2 dataset (Socher et al., 2013) from the GLUE benchmark (Wang et al., 2018). We use CIFAR-10 (Krizhevsky et al., 2009) image classification dataset with a default split. The US Census bureau has released the 2020 Census data using DP (Abowd et al., 2022).
Dataset Splits	Yes	We use CIFAR-10 (Krizhevsky et al., 2009) image classification dataset with a default split. We fine-tune a GPT-2 (small) (Radford et al., 2019) using Lo RA (Yu et al., 2021) with DP-SGD on the SST-2 sentiment classification task (Socher et al., 2013) from the GLUE benchmark (Wang et al., 2018).
Hardware Specification	Yes	We use an Nvidia Ge Force RTX 4070 16 GB GPU machine for the deep learning experiments.
Software Dependencies	No	Py Torch (Paszke et al., 2019) for implementing neural networks. opacus (Yousefpour et al., 2021) for training Py Torch neural networks with DP-SGD. numpy (Harris et al., 2020), pandas (pandas development team, 2020), and jupyter (Kluyver et al., 2016) for numeric analyses. seaborn (Waskom, 2021) for visualizations.
Experiment Setup	Yes	Text Sentiment Classification case study details Parameters: Poisson subsampling probability 0.004, Expected batch size 256, Gradient noise multiplier (σ) {0.5715, 0.6072, 0.6366, 0.6945, 0.7498}, Privacy budget (ε) at δ = 10 5 {3.95, 3.2, 2.7, 1.9, 1.45}, Training epochs 3, Gradient clipping norm ( 2) 1.0, Lo RA dimension 4, Lo RA scaling factor 32. Image-classification case study details Parameters: Poisson subsampling probability 0.16, Expected batch size 8192, Gradient noise multiplier (σ/ 2) {4, 5, 6, 8, 10}, Training epochs 100, Gradient clipping norm ( 2) 0.1, Learning rate 4, Momentum (Nesterov) 0.9.