Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Tighter Privacy Auditing of DP-SGD in the Hidden State Threat Model

Authors: Tudor Cebere, Aurélien Bellet, Nicolas Papernot

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Section 5: PRIVACY AUDITING RESULTS ON REAL DATASETS. Section 5.1: EXPERIMENTAL SETUP. Figure 1: Auditing results for AGC (ours) and AL on Conv Net (Fig. 1a) and Res Nets (Fig. 1b) at periodicity k = 1 and C {1, 2, 4}. In Fig. 1c we present the results for AGC-R (ours), AGC-S (ours) and AL on FCNN (Housing dataset) at periodicity k = 1.
Researcher Affiliation	Academia	Tudor Cebere Inria, Université de Montpellier EMAIL. Aurélien Bellet Inria, Université de Montpellier EMAIL. Nicolas Papernot University of Toronto & Vector Institute EMAIL.
Pseudocode	Yes	Algorithm 1 Privacy auditing. Algorithm 2 Gradient Generation for AGC-R (Random Biased Dimension). Algorithm 3 Noisy gradient generation for AGC-S (Simulated Biased Dimension). Algorithm 4 Privacy auditing with our gradient-crafting adversaries. Algorithm 6 Lower bound search routine over multiple threshold classifiers. Algorithm 7 Rank Sample for ACot S. Algorithm 8 Gradient Generation for ACot S.
Open Source Code	No	The paper does not contain any explicit statement about releasing source code for the methodology described, nor does it provide a direct link to a code repository.
Open Datasets	Yes	Section 5.1: Training details. We perform auditing on two datasets: we choose CIFAR10 (Krizhevsky, 2009)... and Housing (Pace and Barry, 1997).
Dataset Splits	No	The paper mentions using CIFAR10 and Housing datasets and specifies batch sizes, but does not provide specific train/test/validation split percentages, sample counts, or explicit instructions for data partitioning.
Hardware Specification	No	This work was performed using HPC resources from GENCI IDRIS (Grant 2023-AD011014018R1). This statement refers to general HPC resources but does not specify details such as GPU or CPU models, memory, or specific processor types.
Software Dependencies	No	Appendix D: These models were implemented using Py Torch (Paszke et al., 2019), and the DP-SGD optimizer was Opacus (Yousefpour et al., 2021). While PyTorch and Opacus are mentioned, specific version numbers for these software dependencies are not provided.
Experiment Setup	Yes	Section 5.1: We use fixed hyperparameters for each dataset: on CIFAR10, the batch size is 128, and the learning rate is 0.01, while on Housing, the batch size is 400, and the learning rate is 0.1. Training is done with DP-SGD with no momentum. Table 2: Hyperparameters used per model. LEARNING RATE (η) 10 2, BATCH SIZE 400 (Housing) / 128 (CIFAR10), CLIPPING NORM C {1.0, 2.0, 4.0}, NOISE VARIANCE σ 4.