Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

How Well Can Differential Privacy Be Audited in One Run?

Authors: Amit Keinan, Moshe Shenfeld, Katrina Ligett

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Section 6 we explore, both theoretically and empirically,2 auditing of the most important DP algorithm for learning, DP-SGD, as a case study of ORA. Their work leaves open the question of how precisely one-run auditing can uncover the true privacy parameter of an algorithm, and how that precision depends on the audited algorithm. Our experiments are designed to empirically illustrate the theoretical insights and provide additional intuition, not to serve as a comprehensive evaluation in realistic settings.
Researcher Affiliation	Academia	Amit Keinan Moshe Shenfeld Katrina Ligett Department of computer science and engineering The Hebrew University of Jerusalem EMAIL
Pseudocode	Yes	Algorithm 1 One-Run Auditor 1: Input: algorithm M : Xn O, pair vector Z = (x1, y1, ..., xn, yn) X2n such that for all i [n], xi = yi , guesser G : O { 1, 0, 1}n. 2: for i = 1 to n do 3: Sample Si { 1, +1} uniformly. 4: end for 5: Define a dataset D Xn by Di = xi if Si = 1 yi if Si = 1 . 6: Compute o = M(D). 7: Guess T = G(o) { 1, 0, 1}n. 8: Count the numbers of correct guesses v := \|{i [n] : Ti = Si}\| and taken guesses r := \|{i [n] : Ti = 0}\|. 9: Return: v, r
Open Source Code	Yes	Code for running the experiments is available at https://github.com/amitkeinan1/ exploring-one-run-auditing-of-dp.
Open Datasets	No	Our experiments do not use data.
Dataset Splits	No	Our experiments do not involve real data.
Hardware Specification	No	No such information is needed since the experiments do not require any special resources and can be executed on a standard personal laptop (we do not train ML models).
Software Dependencies	No	The paper does not explicitly state software dependencies with version numbers.
Experiment Setup	Yes	We audit DP-SGD with dimension d = 1000 for T = 100 steps, with sample rate 1/10. Since the adversary knows the parameters, the values of the clipping threshold and the learning rate do not affect the auditing. We fix ε = 2 and δ = 10^-5, and use an RDP accountant to compute the noise scale. We set the auditing gradients as described above: each of them is nonzero in exactly one index. When n > d there is no overlap between them, and when n > d we choose an equal number of elements for each index. Our guesser sorts the elements by the value of the update s gradient at the coordinate in which the auditing gradient is non-zero; when committed to taking k guesses, it guesses 1 for the highest k/2 elements, and -1 for the lowest k/2 elements.