On Differentially Private Subspace Estimation in a Distribution-Free Setting
Authors: Eliad Tsfadia
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 4 we empirically compared our method to the additive-gap based approach of Dwork et al. [2014] for the task of privately estimating the empirical mean of points that are close to a small dimensional subspace, demonstrating the advantage of our approach in high-dimensional regimes. |
| Researcher Affiliation | Academia | Eliad Tsfadia Department of Computer Science Georgetown University eliadtsfadia@gmail.com |
| Pseudocode | Yes | Algorithm C.1 (Algorithm Est Subspace). Input: A dataset X = (x1, . . . , xn) (Sd)n. Parameters: k, t. Oracle: A DP algorithm Agg for aggregating projection matrices. Operation: ... (This is an example, there are many such algorithm blocks) |
| Open Source Code | No | The paper mentions the source code for the "Friendly Core-based averaging algorithm of Tsfadia et al. [2022]" is publicly available, but it does not provide an explicit statement or link for the source code of the methodology described in this paper. |
| Open Datasets | No | In order to generate a synthetic dataset that approximately lie in a k-dimensional subspace, we initially sample uniformly random b1, . . . , bk { 1, 1}d and perform the following process to generate each data point: (i) Sample a random unit vector u in Span{b1, . . . , bk}, (ii) Sample a random noise vector ν {1/τ, 1/τ}d, and (iii) Output u+ν u+ν |
| Dataset Splits | No | The paper describes the generation of a synthetic dataset and then directly uses it for experiments and comparisons, but it does not specify explicit training, validation, and test dataset splits with percentages or sample counts. |
| Hardware Specification | Yes | All experiments were tested on a Mac Book Pro Laptop with 8-core Apple M1 CPU with 16GB RAM. |
| Software Dependencies | No | The paper mentions implementing the algorithm in "Python" and using the "sklearn library" (randomized_svd function), but it does not specify version numbers for these software components. |
| Experiment Setup | Yes | In all our experiments, we use ρ = 2 and δ = 10 5, t = 125 (the number of subsets in the sampleand-aggregate process), n = 2tk data points, q = 10 k (the number of reference points in the aggregation), and use the z CDP implementation of the Friendly Core-based averaging algorithm of Tsfadia et al. [2022]. |