reproducibilityindex.ai

Provable Privacy with Non-Private Pre-Processing

Authors: Yaxi Hu, Amartya Sanyal, Bernhard Schölkopf

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results We conducted experiments on a synthetic approximately low rank dataset to corroborate our results in Proposition 9, and summarised the results in Figure 3.
Researcher Affiliation	Academia	1Max Planck Institute for Intelligent Systems, T ubingen, Germany.
Pseudocode	Yes	Algorithm 1 PTR for πPCA rank on DP-GD
Open Source Code	Yes	An implementation for our framework is available at https://github.com/yaxihu/privacy-non-private-preprocessing.
Open Datasets	No	Data generation The synthetic data is generated with the make_classification function in the sklearn library. We generate a 2-class low rank dataset consisting of 1000 data points with dimension 6000 and approximately rank 50. The synthetic dataset has positive yet small eigenvalues for the kth eigenvectors for k 50, ensuring 2LΛ in Equation (4) is small but positive.
Dataset Splits	No	The paper discusses training and evaluation but does not explicitly provide specific training/validation/test dataset splits (percentages, counts, or predefined split references).
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	We employ non-private PCA to reduce the dimensionality of the original dataset to k and then apply private logistic regression. In particular, we use the make_private_with_epsilon method from the Opacus library with Py Torch SGD optimizer with learning rate 1e-2, max_grad_norm = 10 and epochs = 10.
Experiment Setup	Yes	In particular, we use the make_private_with_epsilon method from the Opacus library with Py Torch SGD optimizer with learning rate 1e-2, max_grad_norm = 10 and epochs = 10.