Better Locally Private Sparse Estimation Given Multiple Samples Per User
Authors: Yuheng Ma, Ke Jia, Hanfang Yang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on both synthetic and real datasets demonstrate the superiority of the proposed methods. |
| Researcher Affiliation | Academia | 1School of Statistics, Renmin University of China 2Center for Applied Statistics, Renmin University of China. Correspondence to: Hanfang Yang <hyang@ruc.edu.cn>. |
| Pseudocode | Yes | Algorithm 1 Two-round ULDP sparse estimation. Algorithm 2 Local Rnd (Bassily et al., 2020) Algorithm 3 Freq Oracle (Bassily et al., 2020) Algorithm 4 Heavy Hitter Algorithm 5 ULDPSCO Algorithm 6 Multi-round ULDP sparse linear regression. Algorithm 7 Range Algorithm 8 Mean Algorithm 9 ULDPMean |
| Open Source Code | Yes | The code is publicly available at Git Hub4. https://github.com/Karlmyh/ULDP-SL |
| Open Datasets | Yes | Airline: The Airlines-Departure-Delay dataset originally comes from United States Department of Transportation and currently available on Open ML (Le Dell, 2020) Loan: The Loan-Default-Prediction dataset is obtained from the training set of the Kaggle Loan Default Prediction challenge (Driven Data, 2021a) Mip: The MIP-2016-regression dataset, available on Open ML, comprises 1, 090 instances featuring 144 attributes and 1 output attribute (Bergdoll, 2019). Taxi: The Taxi dataset is obtained from the Differential Privacy Temporal Map Challenge (Driven Data, 2021b) Wine: This dataset originates from the Wine Quality dataset (Cortez et al., 2009) on UCI Machine Learning Repository Yolanda: The Yolanda dataset (Guyon et al., 2019) |
| Dataset Splits | No | We do not perform any parameter selection (e.g. cross validation or validation set) since they are prohibitive under locally private setting (Ma & Yang, 2024; Ma et al., 2024a) or will cost too much privacy budget (Papernot & Steinke, 2021). |
| Hardware Specification | Yes | All experiments are conducted on a machine with 72-core Intel Xeon 2.60GHz and 128GB of main memory. |
| Software Dependencies | No | The conventional Lasso regressor is fitted using the Lasso CV class in scikit-learn package (Pedregosa et al., 2011). While 'scikit-learn' is mentioned, a specific version number is not provided. |
| Experiment Setup | Yes | For each model, we report the best result over its parameter grids, with the best result determined based on the average of at least 30 replications. The number of screened variables is set to 64. The number of selected variables s is selected in {2, 4, 8, 16}. 2-SLR: We select the range [ B, B] in B {1, 2, 3} and the concentration radius is decided by the number of bins, which is in {2, 4, 8, 16, 32}. M-SLR: We set B = 3 and select the number of bins in {2, 4, 8, 16, 32}. Moreover, we set the learning rate of the gradient to be ηt = 0.1 ( 1+t ). LDPPROX: we set r = d log n, τ1 = 4, τ2 = 8. In simulation where we know minβ j =0 |β j| = 0.2, we set λ = 0.05. In real data, we set λ to the 10-th lower quantile of the absolute fitted coefficients. LDPIHT: We select T {2, 5, 10, 20, 50}, η {0.01, 0.1, 1}, τ1, τ2 {2, 4, 8}, k {5, 10, 20, 50}. Lasso: We set n alphas = 300, max iter = 3000, and tol = 10 4. |