Privacy Profiles for Private Selection

Authors: Antti Koskela, Rachel Emily Redberg, Yu-Xiang Wang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerically, our approach improves over the RDP-based accounting in all regimes of interest and leads to substantial benefits in end-to-end private learning experiments. Our general result also allows analysing the case of binomially-distributed number of rounds, which leads to more concentrated distributions compared to the previously considered Poisson distribution.
Researcher Affiliation Collaboration 1Nokia Bell Labs 2Northeastern University 3UC San Diego.
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement about releasing its source code or a link to a repository for the methodology described.
Open Datasets Yes We apply the Generalized PTR to the one-posterior sample (OPS) algorithm described in (Redberg et al., 2023) which includes privately releasing the L2-norm of the non-private solution and also the smallest eigenvalue of the feature covariance matrix. The parameter to tune in the method is the regularization strength λ (see Alg. 7, Redberg et al., 2023) and we carry out a random search on a pre-defined logarithmically equidistant grid meaning that we pick a random value from the grid at each of the K rounds. Notice that we could draw the candidates from any fixed probability distribution; the only requirement is that each candidate mechanism has the same privacy profile. As baselines we have the same approach using the privacy bounds of Liu & Talwar (Thm. 3.5, 2019), the output perturbation method (Chaudhuri et al., 2011) and the non-adaptive method OPS-Balanced by Wang (2018). UCI Bike dataset (n = 17379, d = 17) UCI Elevator dataset (n = 8752, d = 18)
Dataset Splits No The paper mentions using 'UCI Bike dataset' and 'UCI Elevator dataset' and discusses 'training machine learning models', but does not specify any training, validation, or test dataset splits (e.g., percentages or sample counts).
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions the 'Opacus library (Yousefpour et al., 2021)' but does not provide a specific version number for it or other software dependencies necessary for replication.
Experiment Setup Yes We fix q = 0.01 and set as a threshold ϵQ = 1.5 and δ = 10^-6. We consider three σ-candidates: 2.0, 3.0 and 4.0 and for each of them the number of iterations T is determined to be the maximum such that the privacy profile of the candidate is below (ϵ1, δ1)-and (bϵ, δ/m)-thresholds. As a result we can run the candidate models for 4000, 10000 and 18000 iterations, respectively.