Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Purifying Approximate Differential Privacy with Randomized Post-processing

Authors: Yingyu Lin, Erchi Wang, Yian Ma, Yu-Xiang Wang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We propose a framework to convert (ε, δ)-approximate Differential Privacy (DP) mechanisms into (ε , 0)-pure DP mechanisms under certain conditions, a process we call purification. This algorithmic technique leverages randomized postprocessing with calibrated noise to eliminate the δ parameter while achieving nearoptimal privacy-utility tradeoff for pure DP. It enables a new design strategy for pure DP algorithms: first run an approximate DP algorithm with certain conditions, and then purify. This approach allows one to leverage techniques such as strong composition and propose-test-release that require δ > 0 in designing pure-DP methods with δ = 0. We apply this framework in various settings, including Differentially Private Empirical Risk Minimization (DP-ERM), stability-based release, and query release tasks. To the best of our knowledge, this is the first work with a statistically and computationally efficient reduction from approximate DP to pure DP. Finally, we illustrate the use of this reduction for proving lower bounds under approximate DP constraints with explicit dependence in δ, avoiding the sophisticated fingerprinting code construction.
Researcher Affiliation Academia Yingyu Lin , Erchi Wang , Yi-An Ma, Yu-Xiang Wang University of California, San Diego EMAIL
Pseudocode Yes Algorithm 1: Apure(xapx, Θ, ε , δ, ω): Purification of Approximate Differential Privacy Algorithm 2: Apure-discrete(ε, δ, uapx, Y): Binary Embedding Purification for Finite Spaces Algorithm 3: Differentially Private SGD [ACG+16] Algorithm 4: Pure DP SGD Algorithm 5: Frank-Wolfe algorithm (Non-Private) Algorithm 6: Approximate DP Frank-Wolfe Algorithm [TGTZ15] Algorithm 7: Pure DP Frank-Wolfe Algorithm Algorithm 8: Sparse Vector Recovery Mrec(b, Φ, ξ) Algorithm 9: MPTR(X, q, ε, δ, β): Propose-Test-Release [DL09] Algorithm 10: Pure DP Propose-Test-Release Algorithm 11: Algorithm 12: Pure DP Mode Release Algorithm 13: Mpure Ada SSP Algorithm 14: Multipliative Weight Exponential Mechanism MWEM(D, Q, T, ρ) [HLM12] Algorithm 15: Proportional Sampling(X, p, m) Algorithm 16: Pure DP Multiplicative Weight Exponential Mechanism
Open Source Code No The paper does not provide explicit statements about releasing code or links to source code repositories. The NeurIPS checklist at the end of the paper states 'The answer NA means that the paper does not include experiments.' and 'The answer NA means that paper does not include experiments requiring code.', further indicating no code release for the described methodology.
Open Datasets No The paper is theoretical and discusses abstract datasets in problem settings (e.g., 'dataset D', 'data universe X') for theoretical analysis. It does not mention specific datasets used in experiments, nor does it provide any links, DOIs, repositories, or citations for publicly available or open datasets.
Dataset Splits No The paper is theoretical and focuses on algorithm design and proofs, not empirical evaluation. Therefore, it does not describe training, testing, or validation dataset splits.
Hardware Specification No The paper is theoretical and describes algorithms and proofs. It does not mention any specific hardware (e.g., GPU/CPU models, memory) used for running experiments.
Software Dependencies No The paper is theoretical and describes algorithms and proofs. It does not list specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions) that would be needed to replicate experiments.
Experiment Setup No The paper is theoretical and focuses on algorithm design, privacy analysis, and utility bounds. It does not include experimental results and therefore does not provide details on experimental setup such as hyperparameters or training configurations.