Perturb-and-Project: Differentially Private Similarities and Marginals
Authors: Vincent Cohen-Addad, Tommaso D’Orsi, Alessandro Epasto, Vahab Mirrokni, Peilin Zhong
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Theoretical | In this work, we study input perturbation techniques from a theoretical perspective. We focus on the problem of designing differentially private projection functions (such as e.g., dimensionality reduction techniques). We observe that, quite surprisingly, input perturbations yield the best known (and conjecturally optimal) guarantees for a large class of projection functions. This challenges the expectation that a general approach might incur sub-optimal utility loss. Our analysis reveals that the algorithm s utility guarantees are contingent upon the richness (metric entropy) of the target set (the set into which the input is projected). Inspired by (d Orsi et al., 2023), we use sum-of-squares to obtain tight bounds on the Gaussian complexity of set of solutions. To further improve, we introduce new sum-of-square certificates (e.g. for sparse injective tensor norms). Finally, while our sum-of-squares solution might appear complex, we extract an interesting explanation for the practical success of algorithms like linear projection. |
| Researcher Affiliation | Collaboration | 1Google Research 2BIDSA, Bocconi. |
| Pseudocode | Yes | Algorithm 1 Perturb-and-project; Algorithm 2 Perturb-and-alternately-project; Algorithm 3 Private pair-wise cosine similarities; Algorithm 4 Practical pair-wise cosine similarities |
| Open Source Code | No | The paper does not provide any statement or link indicating that open-source code for the described methodology is available. |
| Open Datasets | No | The paper is theoretical and does not conduct experiments on specific public datasets. It discusses 'datasets' in a general sense (e.g., 'dataset of vectors in {0, 1}n') but does not name or provide access information for any public or open dataset used for training or evaluation. |
| Dataset Splits | No | The paper is theoretical and does not describe experimental validation procedures, such as dataset splits (training, validation, test) or cross-validation setups. |
| Hardware Specification | No | The paper focuses on theoretical contributions and algorithm design; therefore, it does not mention any specific hardware specifications used for experiments. |
| Software Dependencies | No | The paper does not list any specific software dependencies with version numbers for replication. |
| Experiment Setup | No | The paper is theoretical and focuses on algorithms and proofs; it does not provide specific experimental setup details such as hyperparameters or training configurations. |