Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Sequentially Auditing Differential Privacy
Authors: Tomás González Lara, Mateo Dulce Rubio, Aaditya Ramdas, Mónica Ribero
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show this test detects violations with sample sizes that are orders of magnitude smaller than existing methods, reducing this number from 50K to a few hundred examples, across diverse realistic mechanisms. Notably, it identifies DP-SGD privacy violations in under one training run, unlike prior methods needing full model training. ... We validate our methods on common DP mechanisms with Gaussian and Laplace noise. We further demonstrate efficacy on auditing benchmark algorithms [26, 5] and provide results for the challenging case of DP-SGD [1, 46], showcasing the practical benefits of early failure detection enabled by our sequential approach. |
| Researcher Affiliation | Collaboration | Tomás González Carnegie Mellon University EMAIL Mateo Dulce Rubio New York Universiy EMAIL Aaditya Ramdas Carnegie Mellon University EMAIL Mónica Ribero Google Research EMAIL |
| Pseudocode | Yes | Algorithm 1 Sequential DP Auditing Algorithm 2 Sequential DP Auditing with an E-process Algorithm 3 Online Newton Step in 1D Algorithm 4 Online Gradient Ascent in RHKS |
| Open Source Code | Yes | The code to replicate our experiments is publicly available: https://github.com/google-research/ google-research/tree/master/dp_sequential_test |
| Open Datasets | No | The paper describes using synthetic data in Section 4.1 by fixing "neighboring datasets to S = {0} and S = {0, 1}" and implies use of data for DP-SGD in Section 4.2 but does not name a specific publicly available dataset (like CIFAR-10 or ImageNet) or provide access information for any other dataset used. |
| Dataset Splits | Yes | Moreover, we use 20 initial samples to set the bandwidth for the MMD tester using the median of the pairwise distances [17], which are then excluded from the actual testing phase to maintain statistical validity. We repeat each experiment 20 times and report the aggregated findings to ensure robust results and account for statistical variability. |
| Hardware Specification | Yes | All the experiments presented in the main text and in the following subsections were conducted using Google Colab s standard CPU runtime environment (12.7 GB RAM) with Python 3. |
| Software Dependencies | No | The paper mentions "Python 3" but does not specify a version number or any other software libraries with their version numbers that are critical to replicate the experiments. |
| Experiment Setup | Yes | For each setting, we test the null hypothesis that the mechanism satisfies (ε, δ)-DP using the characterization in Definition 3.2 against the alternative that it does not. For this set of experiments, we fix the neighboring datasets to S = {0} and S = {0, 1}, although the sequential test remains agnostic of the specific choice of neighboring datasets. Moreover, we use 20 initial samples to set the bandwidth for the MMD tester using the median of the pairwise distances [17], which are then excluded from the actual testing phase to maintain statistical validity. We repeat each experiment 20 times and report the aggregated findings to ensure robust results and account for statistical variability. We report a failure to reject the null (no violation detected) when the test reaches 2,000 observations for ε = 0.01 and 5,000 samples for ε = 0.1. |