Easy Differentially Private Linear Regression

Authors: Kareem Amin, Matthew Joseph, Mónica Ribero, Sergei Vassilvitskii

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate all four algorithms on the following datasets. The first dataset is synthetic, and the rest are real. Our main experiments compare the four methods at (ln(3), 10 5)-DP. A concise summary of the experiment results appears in Figure 1.
Researcher Affiliation Collaboration {kamin, mtjoseph, mribero, sergeiv}@google.com. Part of this work done while M onica was at UT Austin.
Pseudocode Yes Algorithm 1 PTRCheck. Algorithm 2 Tukey EM.
Open Source Code Yes All experiment code can be found on Github (Google, 2022).
Open Datasets Yes 1. Synthetic (d = 11, n = 22,000, Pedregosa et al. (2011)). 2. California (d = 9, n = 20,433, Nugent (2017)) predicting house price. 3. Diamonds (d = 10, n = 53,940, Agarwal (2017)), predicting diamond price.
Dataset Splits No No specific training/validation/test dataset splits (e.g., percentages, sample counts, or cross-validation setup) are explicitly stated in the paper.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running the experiments were provided.
Software Dependencies No The paper mentions software like "Tensor Flow Privacy and Keras (Chollet et al., 2015)" and "sklearn.make regression" but does not provide specific version numbers for these software dependencies, which are required for reproducibility.
Experiment Setup Yes Our experiments tune DPSGD over a large grid consisting of 2,184 joint hyperparameter settings, over learning rate {10 6, 10 5, . . . , 1}, clip norm {10 6, 10 5, . . . , 106}, microbatches {25, 26, . . . , 210}, and epochs {1, 5, 10, 20}. Figure 4: Hyperparameter settings used by DPSGD on each dataset.