reproducibilityindex.ai

Easy Differentially Private Linear Regression

Authors: Kareem Amin, Matthew Joseph, Mónica Ribero, Sergei Vassilvitskii

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate all four algorithms on the following datasets. The ﬁrst dataset is synthetic, and the rest are real. Our main experiments compare the four methods at (ln(3), 10 5)-DP. A concise summary of the experiment results appears in Figure 1.
Researcher Affiliation	Collaboration	{kamin, mtjoseph, mribero, sergeiv}@google.com. Part of this work done while M onica was at UT Austin.
Pseudocode	Yes	Algorithm 1 PTRCheck. Algorithm 2 Tukey EM.
Open Source Code	Yes	All experiment code can be found on Github (Google, 2022).
Open Datasets	Yes	1. Synthetic (d = 11, n = 22,000, Pedregosa et al. (2011)). 2. California (d = 9, n = 20,433, Nugent (2017)) predicting house price. 3. Diamonds (d = 10, n = 53,940, Agarwal (2017)), predicting diamond price.
Dataset Splits	No	No specific training/validation/test dataset splits (e.g., percentages, sample counts, or cross-validation setup) are explicitly stated in the paper.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running the experiments were provided.
Software Dependencies	No	The paper mentions software like "Tensor Flow Privacy and Keras (Chollet et al., 2015)" and "sklearn.make regression" but does not provide specific version numbers for these software dependencies, which are required for reproducibility.
Experiment Setup	Yes	Our experiments tune DPSGD over a large grid consisting of 2,184 joint hyperparameter settings, over learning rate {10 6, 10 5, . . . , 1}, clip norm {10 6, 10 5, . . . , 106}, microbatches {25, 26, . . . , 210}, and epochs {1, 5, 10, 20}. Figure 4: Hyperparameter settings used by DPSGD on each dataset.