Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

An Iterative Algorithm for Differentially Private $k$-PCA with Adaptive Noise

Authors: Johanna Düngler, Amartya Sanyal

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In our experiments, we compare k-DP-PCA and k-DP-Ojas against two modified versions of the DPGauss algorithms of Dwork et al. [2014b] and a modified version of the noisy power method [Hardt and Price, 2014]. All of these works operate in a deterministic setting, and require some form of norm (a) σ = 0.025 (b) σ = 0.001 (c) growing dimension Figure 1: Comparison of k-DP-PCA vs DP-Gauss-1 (input perturbation), DP-Gauss-2 (output perturbation), and DP-Power-Method on the spiked covariance model. We plot the mean over 50 trials, with shaded regions representing 95% confidence intervals. We set k = 2, d = 200, λ1 = 10, ε = 1, and δ = 0.01.
Researcher Affiliation Academia Johanna Düngler Department of Computer Science University of Copenhagen EMAIL Amartya Sanyal Department of Computer Science University of Copenhagen EMAIL
Pseudocode Yes Algorithm 1 k-DP-PCA... Algorithm 2 Modified DP-PCA... Algorithm 3 DP-Ojas... Algorithm 4 Black Box PCA... Algorithm 5 Oja s Algorithm... Algorithm 6 Top-Eigenvalue-Estimation... Algorithm 7 Private-Mean-Estimation
Open Source Code No Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer:[No] Justification: We will release the code publically after we have cleaned it.
Open Datasets No Experimental Results using Spiked Covariance Data We evaluate all methods on the spikedcovariance model(see Example 1). Figures 1a and 1b show utility as a function of sample size for large and small noise levels, respectively. Our results show that across both regimes, k-DP-PCA consistently outperforms the baselines, with the gap widening when the noise level is significantly smaller than the signal strength (σ λ1). Figure 1c examines the effect of increasing ambient dimension d at fixed n. As d grows, the DP-Gauss methods and Power-Method s utility degrades faster than k-DP-PCA s, reflecting the fact that their theoretical utility scales like O(d3/2/n), whereas our guarantee only incurs a linear dependence on d . ... The numerical experiments were run on synthetic data and are therefore not related to any private or personal data, and there s no explicit negative social impacts.
Dataset Splits No The paper uses synthetically generated data, described as 'We sample data from the spiked covariance model...' and 'For the case k = 1, we generate samples via...'. It reports results as 'We plot the mean over 50 trials', indicating repeated experiments rather than fixed train/test/validation splits from a single dataset. No specific dataset split percentages, counts, or methodologies are provided for reproduction.
Hardware Specification Yes Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments? Answer: [Yes] Justification: All experiments were run locally on a Mac Book M3 Pro.
Software Dependencies No The paper mentions comparing against 'DP-Gauss algorithms of Dwork et al. [2014b]' and 'noisy power method [Hardt and Price, 2014]', which are algorithmic baselines. However, it does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow, or specific libraries) used for implementing these algorithms or the proposed methods. Therefore, no specific ancillary software dependencies with versions are provided.
Experiment Setup Yes We set β = C λ1 + σ p d log(n/ζ) for DP-Gauss-1 and DP-Gauss-2, where n is the number of samples, 1 ζ is the probability of not clipping. We set ζ = 0.01 uniformly across all methods, including our algorithms (MODIFIEDDP-PCA and k-DP-Ojas) as well as both Gauss baselines. For both k-DP-PCA and k-DP-Ojas, the parameters K and a (as defined in Assumption A) must be provided as inputs. In the case of data generated as described above, we have a = 1 and K = O(1), and thus we set a = 1 and K = 1 for our experiments. Additionally, k-DP-PCA requires specifying a batch size B, which is used in the PRIVMEAN algorithm. While the theoretical analysis suggests that the optimal choice is B = n/ log3(n), where n is the sample size, we found empirically that setting B = n yielded improved performance in practice. Lastly, we need to set a learning rate for k-DP-PCA and k-DP-Ojas. For k-DP-PCA we set the learning rates to be ηi t = 1/(20σλi + (λi λi+1) t/ log(n)) where t refers to the tth update step inside of MODIFIEDDP-PCA (t [T] where T = n/B ) and i to the ith iteration of k-DP-PCA. For k-DP-Ojas we empirically found that simply choosing a decreasing learning rate (independent of eigenvalues) resulted in good performance, so we set the learning rate to be ηj = 1/(1 + j) for j [n] for all k iterations of k-DP-Ojas.