Better Locally Private Sparse Estimation Given Multiple Samples Per User

Authors: Yuheng Ma, Ke Jia, Hanfang Yang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on both synthetic and real datasets demonstrate the superiority of the proposed methods.
Researcher Affiliation Academia 1School of Statistics, Renmin University of China 2Center for Applied Statistics, Renmin University of China. Correspondence to: Hanfang Yang <hyang@ruc.edu.cn>.
Pseudocode Yes Algorithm 1 Two-round ULDP sparse estimation. Algorithm 2 Local Rnd (Bassily et al., 2020) Algorithm 3 Freq Oracle (Bassily et al., 2020) Algorithm 4 Heavy Hitter Algorithm 5 ULDPSCO Algorithm 6 Multi-round ULDP sparse linear regression. Algorithm 7 Range Algorithm 8 Mean Algorithm 9 ULDPMean
Open Source Code Yes The code is publicly available at Git Hub4. https://github.com/Karlmyh/ULDP-SL
Open Datasets Yes Airline: The Airlines-Departure-Delay dataset originally comes from United States Department of Transportation and currently available on Open ML (Le Dell, 2020) Loan: The Loan-Default-Prediction dataset is obtained from the training set of the Kaggle Loan Default Prediction challenge (Driven Data, 2021a) Mip: The MIP-2016-regression dataset, available on Open ML, comprises 1, 090 instances featuring 144 attributes and 1 output attribute (Bergdoll, 2019). Taxi: The Taxi dataset is obtained from the Differential Privacy Temporal Map Challenge (Driven Data, 2021b) Wine: This dataset originates from the Wine Quality dataset (Cortez et al., 2009) on UCI Machine Learning Repository Yolanda: The Yolanda dataset (Guyon et al., 2019)
Dataset Splits No We do not perform any parameter selection (e.g. cross validation or validation set) since they are prohibitive under locally private setting (Ma & Yang, 2024; Ma et al., 2024a) or will cost too much privacy budget (Papernot & Steinke, 2021).
Hardware Specification Yes All experiments are conducted on a machine with 72-core Intel Xeon 2.60GHz and 128GB of main memory.
Software Dependencies No The conventional Lasso regressor is fitted using the Lasso CV class in scikit-learn package (Pedregosa et al., 2011). While 'scikit-learn' is mentioned, a specific version number is not provided.
Experiment Setup Yes For each model, we report the best result over its parameter grids, with the best result determined based on the average of at least 30 replications. The number of screened variables is set to 64. The number of selected variables s is selected in {2, 4, 8, 16}. 2-SLR: We select the range [ B, B] in B {1, 2, 3} and the concentration radius is decided by the number of bins, which is in {2, 4, 8, 16, 32}. M-SLR: We set B = 3 and select the number of bins in {2, 4, 8, 16, 32}. Moreover, we set the learning rate of the gradient to be ηt = 0.1 ( 1+t ). LDPPROX: we set r = d log n, τ1 = 4, τ2 = 8. In simulation where we know minβ j =0 |β j| = 0.2, we set λ = 0.05. In real data, we set λ to the 10-th lower quantile of the absolute fitted coefficients. LDPIHT: We select T {2, 5, 10, 20, 50}, η {0.01, 0.1, 1}, τ1, τ2 {2, 4, 8}, k {5, 10, 20, 50}. Lasso: We set n alphas = 300, max iter = 3000, and tol = 10 4.