Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Balancing Utility and Privacy: Dynamically Private SGD with Random Projection

Authors: Zhanhong Jiang, Md Zahid Hasan, Nastaran Saadati, Aditya Balu, Chao Liu, Soumik Sarkar

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments across diverse datasets show that D2P2-SGD remarkably enhances accuracy while maintaining privacy. Our code is available here. ... Extensive evaluations on a wide spectrum of datasets confirm that D2P2-SGD significantly improves model accuracy compared to baseline methods.
Researcher Affiliation Academia Zhanhong Jiang# EMAIL Md Zahid Hasan** EMAIL Nastaran Saadati* EMAIL Aditya Balu# EMAIL Chao Liu*** EMAIL Soumik Sarkar*# EMAIL *Department of Mechanical Engineering, #Translational AI Center, **Department of Electrical and Computer Engineering, Iowa State University ***Department of Energy and Power Engineering, Tsinghua University
Pseudocode Yes Algorithm 1 D2P2-SGD 1: Initialize: Model parameters x1, step size α, number of epochs K, lower dimension p, random matrices A1, A2, . . . , AK, mini-batch size B, training dataset D, noise sequence σ2 ϵ,1, σ2 ϵ,2, . . . , σ2 ϵ,K, gradient clipping parameter γ 2: for k = 1, . . . , K do 3: Split the dataset D into mini-batches of size B and randomly sample one mini-batch B 4: Compute per-sample clipped gradients: ˆgs k = f(xk;s) f(xk;s) +γ , s B 5: Calculate the mini-batch stochastic gradient: gk = 1 B P s B ˆgs k 6: Project noisy gradient using Ak: gk = Ak 1 p A k gk + ϵk , ϵk N(0, σ2 ϵ,k Ip) 7: Update model parameters: xk+1 = xk α gk 8: end for 9: return x K
Open Source Code Yes Extensive experiments across diverse datasets show that D2P2-SGD remarkably enhances accuracy while maintaining privacy. Our code is available here.
Open Datasets Yes Additionally, the datasets for testing our algorithms include Fashion MNIST and SVHN Figueroa (2019). ...In Figures 8 and 9, results for the CIFAR-10 dataset are provided... ...Similarly, for Figures 11-13 (KMNIST, EMNIST, MNIST), D2P2-SGD is favorably comparable to or outperforms all baselines...
Dataset Splits No Split the dataset D into mini-batches of size B and randomly sample one mini-batch B
Hardware Specification Yes All the experiments were conducted on a machine equipped with an Intel Xeon Silver 4110 CPU and an NVIDIA Titan RTX GPU.
Software Dependencies No We leverage the Opacus library Yousefpour et al. (2021) and build the framework on top of it.
Experiment Setup Yes Table 6: Hyperparameters for experiments. Hyperparameter Value Learning rate α 0.01 Clipping parameter γ 0.01 Batch size B (256, 512, 1024) Number of Epoch K 40 Injected noise variance σϵ 3.0 Sampling variance 1 Percentage of dimensionality reduction 0.7 Number of random seeds 4