Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Nearly Tight Black-Box Auditing of Differentially Private Machine Learning

Authors: Meenatchi Sundaram Muthu Selva Annamalai, Emiliano De Cristofaro

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental For models trained on MNIST and CIFAR-10 at theoretical ε = 10.0, our auditing procedure yields empirical estimates of εemp = 7.21 and 6.95, respectively, on a 1,000-record sample and εemp = 6.48 and 4.96 on the full datasets. The source code needed to reproduce our experiments is available from https://github.com/spalabucr/bb-audit-dpsgd.
Researcher Affiliation Academia Meenatchi Sundaram Muthu Selva Annamalai University College London EMAIL Emiliano De Cristofaro University of California, Riverside EMAIL
Pseudocode Yes Algorithm 1 Differentially Private Stochastic Gradient Descent (DP-SGD) [1] Algorithm 2 Auditing DP-SGD
Open Source Code Yes The source code needed to reproduce our experiments is available from https://github.com/spalabucr/bb-audit-dpsgd.
Open Datasets Yes We experiment with the MNIST [24] and CIFAR-10 [23] datasets.
Dataset Splits Yes To ensure a fair comparison between the average-case and worst-case initial parameter settings, we split the training data in two and privately train on only half of each dataset (30,000 images for MNIST and 25,000 for CIFAR-10). This threshold, which we denote as τ, must be computed on a separate set of observations (e.g., a validation set) for the εemp to constitute a technically valid lower bound.
Hardware Specification Yes All our experiments are run on a cluster using 4 NVIDIA A100 GPUs, 64 CPU cores, and 100GB of RAM.
Software Dependencies No The paper mentions Opacus [40], TensorFlow [19], and JAX [4] as supported libraries and refers to the 'Privacy loss Random Variable accountant provided by Opacus [40]', but does not specify version numbers for any software dependencies.
Experiment Setup Yes For MNIST, we train for T = 100 iterations, with a learning rate of η = 4 for both average-case and worst-case initial model parameters. For CIFAR-10, we train for T = 200 iterations, with a learning rate of η = 2 for the average-case initial model parameter setting. For the worst-case initial model parameter setting, we use a learning rate of η = 1 instead... All clipping norms are set to C = 1.0... and batch sizes are set to the dataset size B = n.