Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Privacy amplification by random allocation

Authors: Moshe Shenfeld, Vitaly Feldman

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Numerical evaluation: In Section 5 we provide numerical evaluation and comparisons of our bounds to those for Poisson sampling as well as other relevant bounds.2 Our evaluations across many parameter regimes give bounds on the privacy of random allocation that are very close, typically within 10% of those for the Poisson subsampling with rate k/t. This means that random allocation can be used to replace Poisson subsampling with only a minor loss in privacy. At the same time, in many cases, the use of random allocation can improve utility. In the context of training neural networks via DP-SGD this was shown in [Chua et al., 2024a]. Application of our bounds also lead to improvement over Poisson subsampling in [Asi et al., 2025]. We demonstrate that even disregarding some practical disadvantages of Poisson subsampling, random allocation has a better privacy-utility trade-off for mean estimation in low-dimensional regime. This improvement stems from the fact that random allocation computes the sum exactly whereas Poisson subsampling introduces additional variance. At the same time in the high-dimensional regime noise due to privacy dominates the final error and thus the trade-off boils down to the difference in the privacy bounds.
Researcher Affiliation	Collaboration	Vitaly Feldman Apple EMAIL Moshe Shenfeld The Hebrew university of Jerusalem EMAIL
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks. It focuses on theoretical derivations and numerical evaluations.
Open Source Code	Yes	Python implementation of all methods is available in a Git Hub project and in a package.
Open Datasets	No	The paper does not use or provide concrete access information for any publicly available datasets. For the privacy-utility tradeoff analysis, it considers a synthetic dataset: "Consider a dataset s {0, 1}n sampled iid from a Bernoulli distribution with expectation p [0, 1]".
Dataset Splits	No	The paper does not use publicly available datasets and therefore does not specify any training/test/validation dataset splits. The numerical evaluations are based on theoretical models or simulated data properties.
Hardware Specification	No	The paper does not provide specific hardware details (like exact GPU/CPU models, processor types, or memory amounts) used for running its numerical experiments. It generally mentions computation time on a "typical personal computer" in Appendix F.1.
Software Dependencies	No	The paper mentions "Python implementation of all methods is available in a Git Hub project and in a package." but does not specify any version numbers for Python itself or any libraries/packages used.
Experiment Setup	Yes	Figure 1: Upper bounds on privacy parameter ε as a function of the noise parameter σ for various schemes and the local algorithm (no amplification), all using the Gaussian mechanism with fixed parameters δ = 10 10, t = 106. In the Poisson scheme λ = 1/t. The "flat" part of the RDP based calculation is due to computational limitations, which was computed for the range α [2, 60].