Provable Privacy with Non-Private Pre-Processing

Authors: Yaxi Hu, Amartya Sanyal, Bernhard Schölkopf

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results We conducted experiments on a synthetic approximately low rank dataset to corroborate our results in Proposition 9, and summarised the results in Figure 3.
Researcher Affiliation Academia 1Max Planck Institute for Intelligent Systems, T ubingen, Germany.
Pseudocode Yes Algorithm 1 PTR for πPCA rank on DP-GD
Open Source Code Yes An implementation for our framework is available at https://github.com/yaxihu/privacy-non-private-preprocessing.
Open Datasets No Data generation The synthetic data is generated with the make_classification function in the sklearn library. We generate a 2-class low rank dataset consisting of 1000 data points with dimension 6000 and approximately rank 50. The synthetic dataset has positive yet small eigenvalues for the kth eigenvectors for k 50, ensuring 2LΛ in Equation (4) is small but positive.
Dataset Splits No The paper discusses training and evaluation but does not explicitly provide specific training/validation/test dataset splits (percentages, counts, or predefined split references).
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No We employ non-private PCA to reduce the dimensionality of the original dataset to k and then apply private logistic regression. In particular, we use the make_private_with_epsilon method from the Opacus library with Py Torch SGD optimizer with learning rate 1e-2, max_grad_norm = 10 and epochs = 10.
Experiment Setup Yes In particular, we use the make_private_with_epsilon method from the Opacus library with Py Torch SGD optimizer with learning rate 1e-2, max_grad_norm = 10 and epochs = 10.