Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Provable Privacy with Non-Private Pre-Processing
Authors: Yaxi Hu, Amartya Sanyal, Bernhard Schölkopf
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results We conducted experiments on a synthetic approximately low rank dataset to corroborate our results in Proposition 9, and summarised the results in Figure 3. |
| Researcher Affiliation | Academia | 1Max Planck Institute for Intelligent Systems, T ubingen, Germany. |
| Pseudocode | Yes | Algorithm 1 PTR for πPCA rank on DP-GD |
| Open Source Code | Yes | An implementation for our framework is available at https://github.com/yaxihu/privacy-non-private-preprocessing. |
| Open Datasets | No | Data generation The synthetic data is generated with the make_classification function in the sklearn library. We generate a 2-class low rank dataset consisting of 1000 data points with dimension 6000 and approximately rank 50. The synthetic dataset has positive yet small eigenvalues for the kth eigenvectors for k 50, ensuring 2LΛ in Equation (4) is small but positive. |
| Dataset Splits | No | The paper discusses training and evaluation but does not explicitly provide specific training/validation/test dataset splits (percentages, counts, or predefined split references). |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | We employ non-private PCA to reduce the dimensionality of the original dataset to k and then apply private logistic regression. In particular, we use the make_private_with_epsilon method from the Opacus library with Py Torch SGD optimizer with learning rate 1e-2, max_grad_norm = 10 and epochs = 10. |
| Experiment Setup | Yes | In particular, we use the make_private_with_epsilon method from the Opacus library with Py Torch SGD optimizer with learning rate 1e-2, max_grad_norm = 10 and epochs = 10. |