Scaling Up Differentially Private LASSO Regularized Logistic Regression via Faster Frank-Wolfe Iterations

Authors: Edward Raff, Amol Khanna, Fred Lu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our results demonstrate that this procedure can reduce runtime by a factor of up to 2, 200 , depending on the value of the privacy parameter ϵ and the sparsity of the dataset. We test our algorithm on high-dimensional problems with up to 8.4 million datapoints and 20 million features on a single machine and find speedups ranging from 10 to 2, 200 that of the standard DP Frank-Wolfe method. We test our algorithm on high-dimensional problems with up to 8.4 million datapoints and 20 million features on a single machine and find speedups ranging from 10 to 2, 200 that of the standard DP Frank-Wolfe method.
Researcher Affiliation Industry Edward Raff Booz Allen Hamilton raff_edward@bah.com Amol Khanna Booz Allen Hamilton khanna_amol@bah.com Fred Lu Booz Allen Hamilton lu_fred@bah.com
Pseudocode Yes Algorithm 1 Standard Sparse-Aware Frank Wolfe, Algorithm 2 Fast Sparse-Aware Frank-Wolfe for Linear Models, Algorithm 3 Fibonacci Heap Frank-Wolfe Queue Maintenance, Algorithm 4 Big-Step Little-Step Exponential Sampler
Open Source Code No Our code was written in Java due to the need for explicit looping, and implemented using the JSAT library [36]. There is no explicit statement of code release or a repository link for the specific contributions of this paper.
Open Datasets Yes Table 2: Datasets used for evaluation. We focus on cases that are high-dimensional and sparse. Dataset N D RCV1 20,242 47,236 News20
Dataset Splits No The paper uses the phrase "training iterations" and discusses evaluating on "test datasets" but does not explicitly mention validation splits or their proportions.
Hardware Specification Yes All experiments were run on a machine with 12 CPU cores (though only one core was used), and 128GB of RAM.
Software Dependencies Yes Our code was written in Java due to the need for explicit looping, and implemented using the JSAT library [36].
Experiment Setup Yes Due to the computational requirements of running all tests, we fix the total number of iterations T = 4, 000 and maximum L1 norm for the Lasso constraint to be λ = 50 in all tests across all datasets. This value produces highly accurate models in all non-private cases, and the goal of our work is not to perform hyper-parameter tuning but to demonstrate that we have taken an already known algorithm with established convergence rates and made each iteration more efficient.